a Ba 
a : 
BK i 
ua j 
x 
7 ¥ 
ae 
Be 
A at } 
> j 
| 
oy 
poe 
: a 


aA Journal of Statistics 
for the Physical, Chemical 
and Engineering Sciences 


Engin. Library 


GA 


MAY 1959 


VOLUME 1 NUMBER 2 


Published quarterly by: 
Che American Society for Quality Control 
and Che American Statistical Association 








TECHNOMETRICS 


The purpose of Technometrics is to contribute to the development and use 
of statistical methods in the physical, chemical and engineering sciences. This 
objective places a high premium on succinct communication among the physi- 
cal scientist, engineer, statistician and mathematician. The journal will accept 
for publication papers describing new statistical techniques expected to be use- 
ful in these sciences, papers illustrating the application of known statistical 
methods to new or novel environments, expository or tutorial papers on partic- 
ular statistical methods, and papers dealing with the philosophy and problems 
of applying statistical methods to research, development, design and perform- 
ance. Brief descriptions of problems requiring solution and short technical 
notes will also be accepted for publication. Letters to the Editor, signed by the 
author and limited in length will be published when they are considered timely 
and appropriate. All papers should contain a short but clear summary of con- 
tents and conclusions, an expository section containing numerical examples 
whenever practicable, and appropriate additional sections relating to technical 
derivations. : 


Subscription Rates 


The annual subscription rate for members of the American Society for Quality Con- 
tro] or the American Statistical Association is $6.00 a year. The annual subscription rate 
for non-members is $8.00 a year. 

Members of the sponsoring societies may subscribe to the journal while paying their 
annual dues or by check or money order made out to Technometrics and mailed as follows: 


for the American Society for Quality Control 


Wri P., Younectavs, Jr. 
Room 6197 Plankinton Building 
161 West Wisconsin Avenue 
Milwaukee 3, Wisconsin 


for the American Statistical Association 
Epaar M. Biscrer 
Room 404, Beacon Bldg. 
1757 K Street N.W. 
Washington 6, D. C. 
Non-members may subscribe by check or money order made out to Technometrics 
and mailed to either of the above addresses or to the office of the editor: ~~ 
J. Sruart Hunter 
167 Nassau Street 
Princeton, N. J. 


All subscription fees are payable in the currency of the United States of America. 
Communications concerning changes of address, subscriptions, back numbers, etc, 

should be sent to the office through which the annual subscription fee is paid. Whenever 

possible a copy of the address taken from an issue of the periodical should accompany 4 

change in address request. 

Communication concerning membership in either of the sponsoring societies should be sent 

to that society. 


Application to mail at Second Class postage rate is pending 
at Richmond, Va. and at Additional Mailing Office. 





- TECHNOMETRICS 


A Journal of Statistics for the Physical, Chemical and Engineering Sciences 


Published Quarterly by 
THE AMERICAN SOCIETY FOR QUALITY CONTROL 
and the 
AMERICAN STATISTICAL ASSOCIATION 


Editor 


J. Sruart Hunter 


Associate Editors 
G. A. BaRNarp Besse B. Day 
C. A. BENNETT R. J. Haper 
CuruHpert DANIEL Martin Witk 
Marvin ZELEN 


Management Committee 


Paut S. O_msteap, Chairman 


For the For the 
American Society for Quality Control American Statistical Association 


J. ¥. McCrure 
Maynard RENNER 


Irnvinc Burr 
AtmariIn PHILLIPS 
H. L. Wenery Donatp C, RILEY 


Published Quarterly in February, May, August and November by the 
Technometrics Management Committee of the American Society for Quality 
Control and the American Statistical Association. Editorial Office: 167 Nassau 
St., Princeton, New Jersey. Publication Office: Wm. Byrd Press, P. O. Box 2W, 


Richmond 5, Virginia. Second class mailing privileges granted at Richmond, 
Virginia. 


Compcsed and Printed at the 
Wr. Byrep Press, Inc., Ricomonp, Vincinia, U.S.A. 





CONTENTS 


TECHNOMETRICS, VOL. 1, No. 2, MAY 1959 


Measurements Made by Matching with Known Standards 
W.J. Youden, W.S. Connor and N. C. Severo 


Random Balance Experimentation ......... F, E. Satterthwaite 


The Application of Random Balance Designs...... T. A. Budne 


Discussion of the Papers of Messrs. Satterthwaite and Budne 
W. J. Youden, Oscar Kempthorne, J. W. Tukey, 
G. E. P. Box and J. S. Hunter 


Quick Analysis Methods for Random Balance 
Screening Experiments................. F. J. Anscombe 


Notices 





TECHNOMETRICS May, 1959 


Editor’s Note 


The subject to Random Balance Experimentation has attracted considerable 
interest and concern over the past few years. One of the difficulties encountered 
in trying to discuss problems surrounding random balance has been the lack of 
published technical papers on the subject. As a consequence, Dr. F. E. Satterth- 
waite, who originated the concept of random balance experimentation, and Mr. 
T. A. Budne, an experienced user of the methodology, were asked to present the 
case for random balance experimentation at the joint meeting of the Biometrics 
Society, the Institute of Mathematical Statistics and the Section of Physical 
and Engineering Sciences of the American Statistical Association held in Pitts- 
burgh in March, 1959. Before the meeting, copies of the two contributed papers, 
‘Random Balance Experimentation’ by Dr. Satterthwaite and ‘The Application 
of Random Balance Designs’ by Mr. Budne, were sent to Messrs. F. J. Anscombe, 
J. W. Tukey, O. Kempthorne, W. J. Youden, G. E. P. Box, and others, along 
with an invitation to discuss the papers at the Pittsburgh meetings. Professor 
W. G. Cochran agreed to chair the session. The papers and portions of the 
discussion appear in this issue of Technometrics. Professor Anscombe subse- 
quently submitted a full paper. The comments of the Editor, which also appear 
in the discussion, were written after the meetings. The two contributed papers 
were not refereed. 

Considerable precedent exists for publishing papers along with discussion. 
The Editors of the Journal of the Royal Statistical Society, Series B have made 
this a regular practice for many years. It is hoped that future issues of Techyo- 
metrics will contain similar material. On such occasions, and of course with the 
approval of the author, a contributed paper will be refereed and accepted for 
publication. Galleys of the article will then be distributed to interested members 
of the societies along with an invitation to present comments at a meeting where 
the paper is to be read. These comments which should be brief and carefully 
prepared, will then be given following the presentation of the paper and both 
the paper and comments published. Additional discussion from the floor, when 
appropriate, will also appear in the journal. In this manner, it is hoped that 
Technometrics can contribute to the interest in technical papers presented before 
the societies. Suggestions for papers to be presented at such meetings would be 
very welcome. Further written discussion of the subject of Random Balance 
Experimentation is also welcome. 

To make room for the discussion and rebuttal comments on the random 
balance papers the following articles, originally scheduled for this issue of Tech- 
nometrics, will appear in the next issue. 





1) Simplified Estimators for the Normal Distribution when the Samples are 
Singly Censored or Truncated...................... A. Clifford Cohen 

2) Control Chart Tests Based on Geometric Moving Averages 
....5. W. Roberts 


The Editor is grateful to these authors for their willingness to postpone publi- 
cation. 





TECHNOMETRICS 


Measurements Made by 
Matching with Known Standards 


W. J. Youpen, W. S. Connor, anp N. C. Severo 
National Bureau of Standards 


Quick estimates of an unknown quantity are often made by matching the unknowm 
against a graded series of known standards. This paper discusses the problem of 
choosing the proper size and number of intervals in establishing a standard. Methods 
for determining the standard deviation of the matching process are described. 


SuMMARY 


Quick estimates of an unknown quantity may be made by matching it against 
a series of prepared standards with known values. The matching procedure can 
be considered as a more or less drastic rounding off procedure applied to data 
with a given standard deviation. A curve has been prepared that shows how the 
proportion of times an operator successfully picks the standard closest to the 
unknown depends upon the interval between the standards. If known materials 
are available, an estimate of the standard deviation of the matching procedure 
can be made using the proportion of correct matches. A second curve shows 
how the proportion of times two independent operators agree in the standard 
selected depends upon the interval between the standards. The proportion of 
unknowns on which two operators agree can also be used to estimate the standard 
deviation. 


THE MatTcHING PROBLEM 


Quick estimates of an unknown quantity may be made by matching the un- 
known with one of a graded series of standards with known values. Matching 
techniques usually require a minimum of apparatus and often have the advantage 
of portability so that measurements may be made on the spot in field investiga- 
tions. One of the most common measurements made by matching is the deter- 
mination of the acidity of solutions. Usually a series of standard solutions, 
differing in pH by steps of 0.2pH, are prepared by mixing appropriate stock 
solutions. On addition of a dye, the solutions take on a graded sequence in 
color [1]. The same dye is added to the unknown solution and the reference 
standard with the closest color match provides an immediate estimate for the 
unknown solution. 

This paper was prompted by an inquiry from the U. 8. Geological Survey 
concerning the choice of the interval separating the reference standards in a 
matching method for estimating amounts of uranium in soil samples. 

A small number of reference standards means in turn a large interval between 


101 





102 YOUDEN, CONNOR, AND SEVERO 


successive standards. Certainly, a large interval between standards makes it 
easier for the operator to make a correct match. A correct match is defined as 
selecting that standard whose stated value is closer to the true value of the 
unknown than the stated value of any other standard. Large intervals reduce the 
discriminatory power of the method. Small intervals increase the number of 
standards needed and may slow down the work by making it more difficult to 
decide upon the best match. Indeed, as the interval between standards diminishes, 
the proportion of correct matches also decreases. This is obvious because un- 
knowns approximately midway between standards will be correctly matched at 
best half of the time. The more standards there are, the more unknowns there 
are in the difficult-to-judge zones midway between standards. 

It follows that the chance of selecting a correct match for an unknown depends 
on the interval between successive standards and the location of the unknown 
within an interval. The best chance for a successful match exists when the 
unknown coincides with one of the standards. This chance diminishes as the 
unknown gets farther and farther from a standard and drops to one half or less 
when the unknown is close to the midpoint between two standards. The average 
chance for correct solution, with a given distance between the standards, depends 
upon the distribution of the unknowns within the intervals. This will vary from 
interval to interval depending on the distribution of unknowns over the whole 
range of standards. But if all of the intervals were superimposed, and the densities 
of the unknown values were accumulated, the resulting distribution usually 
would be approximately rectangular. Thus while the heights of men are approxi- 
mately normally distributed, it is reasonable to expect, if men are measured to 
the nearest tenth of an inch, that there are just as many men whose heights 
terminate in one tenth as in five tenths or any other number of tenths. This is 
why a rectangular distribution across the interval is assumed in the derivations 
given in the last section. 


Martcuine AnD RounpDING OFF 


The matching technique is equivalent to a rounding off process on data. 
(The reader is referred to [2] for a discussion of the effect of rounding on estima- 
tion and testing procedures.) For example, suppose a careful assistant has 
recorded the heights of college freshmen to the nearest 0.1 inch. For convenience 
in studying the heights of college men in relation to residence, field of interest, 
etc., an investigator rounds off the heights to the nearest whole inch. The data 
then appear just as if the heights had been recorded by a matching process. All 
that would be needed for such a matching process is a row of standard men 
whose heights are, say, 64, 65, --- , 76 inches. Matching a freshman against one 
of these standard men would provide an automatic rounding off of his height 
to the nearest inch. 

Consider a measurement system where the standard deviation o is unity. 
Although the data may have been recorded to the nearest tenth of a unit, the 
operator may choose to round off all his data to the nearest even integer. This 
is a drastic rounding off because the interval between rounded values is 2c. 





MEASUREMENTS WITH KNOWN STANDARDS 103 


The problem now is to determine what proportion of the values are rounded 
off to the even integer nearest to the unknown true value. Obviously for unknowns 
that coincide with an even integer, all readings that stay within +1c¢ will be 
rounded off to the even integer nearest the unknown value. If the measurement 
is normally distributed, the proportion correct is 0.6827. All readings that have 
errors in the range +1lo to +30 will be rounded off to the next higher even 
integer. And a similar group of negative deviations will be rounded to the next 
lower even integer. é 

For unknowns with values infinitesimally greater than an odd integer, the 
nearest even integer is the even integer just following the odd integer. All meas- 
urements with deviations from zero to +2¢ will be rounded to the even integer 
nearest the true value. Again assuming normality, the proportion correct is 
0.4773. Deviations exceeding 2¢ will be rounded to the next higher even number 
and deviations from zero to —2¢ will be rounded to the even number just below 
the correct even number. é 

The expected value of the proportion of correct matches when the unknowns 
are equally likely to be in any part of the interval is the value sought. Its calcu- 
lation is described in a later section of the paper. The expected proportion 
correct is shown as a function of the width of the interval between standards 
in Figure 1, Curve A. 

Provided that a suitable number and diversity of specimens with known 
values are available, the standard deviation of the measuring process can be 
ascertained by entering the chart with the proportion of correct matches. The 
interval between standards is known and this leads to an immediate estimate of c. 
The preparation of a sufficient number of knowns might mean considerable labor 
so an alternative procedure for estimating o will be considered. 

The proportion of unknowns that two operators, working independently, 
will agree on in their selection of the matching standard is related to the propor- 
tion of correct matches attained by an individual operator. If the assumption 
is made that the two operators are equal in skill, it is possible to calculate the 
proportion of cases in which two operators agree for unknowns in any particular 
position in the interval between standards. Agreement does not guarantee 
correctness; both may be off in the same way. 

Integration over all possible positions within the interval gave Curve B in 
Figure 1. This curve shows the average proportion of unknowns upon which 
operators will agree as a function of the interval between standards. An estimate 
of c may now be obtained by assigning two operators, working independently, 
for a reasonable sequence of the regular work load of samples. At the same time, 
an estimate is obtained of the proportion of correct matches by each operator. 


Spacine THE STANDARDS 


The spacing of the standards used for matching depends on convenience and 
also on certain attitudes in those using the procedure. It may be that it is desirable 
to succeed in picking the correct standard say, 80 percent of the time. Or the 
worker may be unwilling to have a greater risk than 0.05 or 0.02 of being “off” 





YOUDEN, CONNOR, AND SEVERO 


°° 
Oo = 4 6 8 10 


INTERVAL IN MULTIPLES OF o& 


Figure 1. Curve A shows the probability, P, of a successful match as a function of the 
interval between standards. Curve B shows the probability, P, that two observers select 
the same standard as a function of the interval between standards. 


more than one standard. The chance of being off more than one standard appears 
unimportant with the customary spacing of standards. About 4% of results 
will be off by more than one standard when the interval is as small as 1.5c. 
Once it is recognized that the standard deviation of any given matching process 
is fixed within narrow limits for skilled observers, either of the above require- 
ments suffices to establish the spacing of the reference standards. 

Operators sometimes split the difference when an unknown appears to be 
midway between two standards. In general this interpolation is not made if 
the unknown appears to be closer to one of the standards. If the interpolation 
were made whenever the unknown was closer to the midpoint of the interval 
than to either of the adjacent standards, there would be as many interpolations 
as non-interpolations in the laboratory records. In practice interpolations are 
usually in a distinct minority so that the effect of interpolation on the procedure 
is, in most cases, rather small. 

The evaluation of the information obtained in a matching procedure involves 
more than just the proportion of correct matches. Reducing the interval will 
increase the chance of being off by one or more but it must not be forgotten 
that the observer is then off by a smaller amount. It is shown below that 





MEASUREMENTS WITH KNOWN STANDARDS 105 


making the width of the interval less than sigma does not cut the average 
absolute value of the “miss” appreciably. 

The selection of the distance between reference standards will inevitably be 
influenced by various matters of convenience such as having the values of the 
standards set at whole numbers or convenient fractions thereof. 

Convenience and loss of information have to be considered. Loss of information 
is measured in this paper by the average deviation from the true value. For 
three rather wide intervals four items of interest are listed below: 


Size of interval 


1.00 2.0¢ 3.06 
Standard deviation 1.041¢ 1.1550 1.3238¢ 
Average deviation or miss 0.83le 0.9250 1.067¢ 
Proportion of times correct 0.3687 0.6095 0.7343 
Proportion of unknowns on which two observers 
agree 0.2709 0.4860 0.6296 


The laboratory man, on discovering that two independent observers agree on 
only about half of the unknowns, may be disappointed. Perhaps the first impulse 
is to conclude that one or the other of the observers is not making the requisite 
effort to achieve a correct match. This may be true but the more likely explana- 
tion is that the distance between the standards accounts for the disagreements. 
The point that must be grasped is that raising the proportion of agreements 
between two observers (presumably by increasing the distance between stan- 
dards) causes loss of information. The coarse grouping always adds its contribu- 
tion to the inherent variance of matching. Notice, however, that even with an 
interval equal to the standard deviation, the average deviation is increased 
only to 0.83lc¢ from the value 0.7980 corresponding to zero interval. 

The average deviation given in the table is itself an average over all possible 
values of the unknown. It is not the average deviation for repeated observations 
on an unknown in a particular position in the interval. For a very wide interval, 
say of width 6c, the average deviation for observations on a particular unknown 
ranges from approximately zero to approximately 3c, depending on whether 
the unknown coincides with a standard or is midway between two standards. 

Statisticians usually prefer to restrict the rounding or grouping of data to 
intervals not exceeding about one fourth of the standard deviation [3]. Certainly 
it is important to preserve all the information when only limited amounts of 
research data are available for study. A different situation exists in the use of 
a matching procedure as a routine method of measurement. A matching pro- 
cedure will have been studied in advance. In any event, the economy and con- 
venience of a matching procedure must be balanced against some loss of informa- 
tion. The analysis presented in this paper makes it a simple matter for the 
investigator to determine the loss of information associated with a particular 
spacing of the standards. A spacing for the standards can then be selected that 
will, when all relevant matters are taken into consideration, best serve the needs 
of the investigator. 


ge ees ee, Pos es A te ae eh 
PI eS a ae Bt bar bee alas ae 





YOUDEN, CONNOR, AND SEVERO 


DERIVATION Or MATHEMATICAL FORMULAE 


The process of matching can be thought of as measuring the unknown true 
value of some characteristic of an object, and of comparing the measured value 
with a series of equally spaced standard values. Then the object is assigned 
the standard value which is closest to the measured value. This will be called 
the assigned standard value, to distinguish it from the standard value which is 
closest to the true value of the object. Of course the assigned standard value 
often will be the same as the closest standard value. 

We now shall describe this process symbolically. Let the true value of the 
unknown be 0, the measured value be Y, the assigned standard value be X, 
and the assigned standard value minus the measured value be Z. Then 


(1) X=Y+2Z. 


The diagram below shows a possible configuration for the process. 


closest true measured assigned 
standard value value standard 
value ~ value 


Pena ears sical |—__| 


0 ; 4 

eet ae 

Pe 
| 


In this process the grid of standard values is fixec Let the standard values 
be separated by intervals of length 2a. Then we could assume that the true 
value of the unknown is uniformly distributed in the interval (—a, a). An 
equivalent assumption, which is made in this paper, is that the true value is 
fixed at 0 and the grid of standard values is imposed at random [2]. 

Let Y be a random variable which is normally distributed with mean zero 
and standard deviation c, let Z be uniformly distributed on the interval (—a, a), 
and let Y and Z be statistically independent. The process is now completely 
specified, and we shall develop certain aspects of it. 

For this purpose we shall need the probability density function g(x) of X. 
Let f,(u,) and f.(u2) be the probability density functions of the statistically 
independent random variables U, and U, , respectively. In [4] it is shown that 
the probability density function f(u) of the random variable U = U, + U, 
is given by 


2) fu) = [fiw — Ofs® ae. 
Explicitly, the probability density function of Y is 


1 —y?/2e2 


3 1 (u) > —-a < < © 
(3) o*w — y 





MEASUREMENTS WITH KNOWN STANDARDS 
and that of Z is 


0, z<-a 
(4) h(z) = 41/2a, —-a<z< a 
[ 0, a<z 


Substituting from (3) and (4) into (2), we obtain 
“1 {x-é\1 
gx)= |] = (24) 5, 4 
(5) i o o 2a 


~i[o9)- oe 


; 1 eee 
6 &(i) = dé. 
6) =f Geeta, 

We now seek the probability P, , corresponding,to Curve A in the figure, 
that an unknown object will be assigned the standard value which is closest 
to its true value. This will happen if, and only if, the assigned standard value is 
within a units of the true value zero, so that the desired probability is given by 


P,=[ g@a 
J [. 


- 3 J. [t#4)- 4 7) a 


This may be written as 


© rk [d=tt)-o=t=)]a 


On comparing (8) with (2) we observe that (8) is 2a times the probability 
density function, evaluated at w = 0, of the random variable 


(9) W=X4+2Z,=Y+Z2+4, 


where Z, is uniformly distributed as in (4), and is statistically independent of 
Y and Z. 


The distribution of T = Z + Z, is triangular, with probability density function 
0, t< —2a 
(10) p(t) = \(2a — | t|)/(2a)*, -2a<t< 2a 
| 0, 2a <t 





108 YOUDEN, CONNOR, AND SEVERO 
Again using (2) for W = Y + T, (7) and (8) may be written as 


ro Bf diiatta 


~2a (2a) : 


- fd) a-z [a 


which is readily evaluated to give 


om rae [sft) -o(=2)] +¢[ 4) -e] 


Next we seek the probability P, , corresponding to Curve B of the figure, 
that two independent measurements on the same object will result in the same 
assigned standard value, regardless of whether it is the closest standard value. 
If the first measurement results in assigned standard value z, then the second 
measurement must lie within a units of zx. Accordingly, the probability element 
for the joint occurrence of these events is 


g(2) ix| o(2 +2) i (2=2)| = 2alg(2)]? dex 
and the desired probability is 


(13) P, = 20 [ (oP ae. 


Comparing (13) with (2), we observe that P,;/2a is the probability density 
function, evaluated at s = 0, of the random variable 


where Y, is normally distributed according to (3), and is statistically independent 
of Y, Z, and Z, . Using the facts that Y + Y, is normally distributed with 


variance 2c", and Z + Z, is triangularly distributed according to (10), and 
using (2), it follows that 


tera cr, a a 


(11) 


(14) 


- He fidda-aef dhe 
Jacl. AVae . ai. V2 6 ” 
which may be written as 


09 rox [of -of=Y85)] 4 Eel (22) - 0] 


o 


Finally, we seek the expected value of | X |, which is defined as 


E|xX|=f[ lel o@ae 


=f faets) dts) 


(16) 





MEASUREMENTS WITH KNOWN STANDARDS 
Integrating by parts, we obtain 


ae | e[dtt*) - 4) a 
which gives ; 
an Bi xt = (*E#)]o(8) - of -2)] + a8) 


REFERENCES 


{1] W. M. Cuark, (1928), The Determination of Hydrogen Ions, The Williams and Wilkins 
Company, Baltimore, p. 64. 

[2] C. Ersennart, M. W. Hastay and W. A. Wat1s, (1947), Selected Techniques of Statistical 
Analysis, McGraw-Hill Book Company, Inc., New York, Chapter 4. 

{3] R. A. Fisher, (1950), Statistical Methods for Research Workers, Oliver and Boyd, Edinburgh, 
Section 13. 

{4] H. Cramér, (1951), Mathematical Methods of Statistics, Princeton University Press, Prince~ 
ton, p. 191. 


- 
A 
Sa 
oe 
a 
es 
a 
a 
wae: 
hss 
be, 
Side. 
Ps 
an 








Vor. 1, No. 2 TECHNOMETRICS May, 1959 


Random Balance Experimentation 


F. E. SATTERTHWAITE 
Statistical Engineering Institute, 
Wellesley, Mass., 
and Merrimack College, 
North Andover, Mass. 


Random balance experimental design has been used in industrial applications of 
statistical methods since 1956. The purpose of this report is to record and discuss in 
specific form key points regarding this technique. This report will be divided into 
several parts that are essentially separate. 

The topics to be covered include the following: 


(1) Random design and the motivations for use of rundom designs in industrial, 
engineering and research activities. 

(2) Techniques for analysis of random data and the apparent application domains 
for which each technique is appropriate. 

(3) Fundamental problems of statistical analysis that are of critical importance 
in some random balance applications. 


Part I. Ranpom DeEsiIens 


The design of an experimental program is the matrix of input variable values. 
Thus a program of four experiments to study the effects of temperature and 
pressure might have the design: 


Experiment No. Temperature Pressure 


1 60 15 
2 70 20 
3 70 15 
4 60 20 


The design matrix is, in this example, the matrix: 


Z 
15 
20 
15 
20 





112 F. E. SATTERTHWAITE 


A variable, x2 , is said to have exact balance with respect to a variable, z, , 
in an experimental design if the exact distribution of x, is the same for each 
value of x, . In the above design, the pressure, x, , has exact balance with respect 
to the temperature, zx, , since it has the actual distribution, (15, 20), for tem- 
perature 60 and the identical actual distribution, (15, 20), for temperature 70. 

A random design is one for which a random sampling process is used to choose 
all or some of the elements of design matrix, X’. If we should carry out the 
above experiment, good practice would be to conduct the four experiments in 
random order. This order of conducting the experiments is in fact a third variable 
and the complete design might be, for example, 


vy Xv 
60 15 
70 20 
70 15 
60 20 


where the third column indicates that experiment number 4 is to be conducted 
first, experiment number 2 is to be conducted second, etc. The order numbers 
shown in the third column are selected by an appropriate random sampling 
process. Therefore the design, is a random design with respect to the order 
variable, x; . (It is a fixed design with respect to variables z, and z, since no 
random sampling process was used to select the values of x, and 2 .) 

A variable, x; , is said to have random balance with respect to the variables, 
2, and 22 , if the random sampling process used to select the x,-values associated 
with any one (x; , x2)-combination is identical to the random sampling process 
used to select the z;-values associated with every other (x, , x3)-combination. 
Thus in the above design, 


a Tv 
60 15 
70 20 
70 15 
60 20 


the variable z; has random balance with respect to (x , 42) since the random 
process used to assign x;-values was identical for the four (x, x2)-combinations: 


(60,15), (70, 20), (70, 15), (60, 20). 


A variable x, is said to have random unbalance with respect to a variable, 2; , 
if the random sampling process used to select the x.-values associated with any 





RANDOM BALANCE EXPERIMENTATION 113 


z,-value is different from the random sampling process used to select the z.-values 
associated with any other z,-values. For example, consider the design constructed 
as follows: 


No. Zs %=0.1 @a+e e 


60 +3 
60 0 
70 0 
70 ) —1 
80 +1 
80 0 


where e is a random selection from the uniform distribution, (+1, 0, —1). 
The variable x, has random unbalance with respect to x, since different distribu- 
tions were used to select 2, : 


tM Distribution to random select 22 


60 (5, 6, 7) 
70 (6, 7, 8) 
80 (7, 8, 9) 


Notice that in the above definitions, no limitations have been placed on the 
type of random sampling process used to select specific z-input variable values. 
In practice many types of random processes are used: 


(a) Random selection with or without replacement. 
(b) Random selection from uniform, normal, binomial, or any other appro- 


priate distributions (including arbitrary distributions). 

(c) Random selection of a single variable or the joint selection of two or more 
variables from a multivariable distribution. 

(d) Random selection from discrete or from continuous distributions. 

(e) Restricted or unrestricted random selections. A common restriction is to 
throw out any selections that cause correlation coefficients between 
pairs of input variables that are outside prechosen limits. 


A pure random balance design is one for which: 


(a) The values of each input variable are selected by an appropriate random 
process. 

(b) The random selection process used for a specific input variable is com- 
pletely independent of the specific values selected for all other input 
variables. 


In other words, in a pure random balance design, every input z-variable has 
random balance with respect to every other input variable. 

In practice pure random balance designs are often restricted by eliminating 
the selections for any input variable that cause correlation coefficients with 
any other input variable larger than a prechosen limit. Such designs are called 
restricted pure random balance designs. 





114 F. E. SATTERTHWAITE 


A pure random balance design is almost always inefficient in some sense. 
Therefore pure random balance designs are seldom used, except incidentally, 
for major investigations. On the other hand, experience has shown that there 
are important and large classes of problems for which these inefficiencies are 
often unimportant or even trivial. These include: 


(a) Investigations where a large amount of data can be collected quickly at 
low cost. 

(b) Investigations planned, conducted, analyzed, and interpreted by persons 
with relatively little technical, statistical, or mathematical training. 
The number of “‘little’’ problems that must be solved by “‘little’”’ people 
is very large. 

(c) Investigations where random designs simplify the administration of the 
experimentation. Experience has uncovered a large number of situations 
where administrative simplicity is a dominate, or even a compelling, 
reason for use of a pure random balance design. 

(d) When experimentation is conducted on regular manufacturing operations, 
particularly trouble-shooting investigations. 

(e) Continuous experiments to optimize processes, products, or systems. 

(f) Preliminary investigations and sub-investigations in connection with 
major investigations. 

(g) Investigations where the use of the conclusions is for temporary or other 
minor purposes. 


A pure random design (generally unbalanced) is one where we define the 
domain of all possible combinations of the input variables that are pertinent to 
the investigation and then take an appropriate random sample from this multi- 
variable population of combinations. 

A pure random design is particularly appropriate when we have little prior 
knowledge of the functional form of the relationships between the input and 
output variables at the time the experiments are planned. That is, we do not 
know the “model’’, 


y = f(x, 5 tes Me 5s 


that will be used to analyze and interpret the data. The terms, balance and 
unbalance, have no meaning without such knowledge of the model. For example, 
the design, 


Temperature (7’) Pressure (P) 


600 10 
600 20 
700 10 
700 20 


is an exact balance design with respect to the model 


Y = A + aifi(T) + aef.(P) + e. 





RANDOM BALANCE EXPERIMENTATION 


However with respect to the model 
y = a + a,(PT) + a,(T/P) + e. 


the design is 
(PT) 
6000 
12000 


7000 
14000 


This design is thus very unbalanced with respect to the second model. If the 
second model should be approximately the true model, this design would be 
very inefficient to evaluate the unknown coefficients. 

Experience has shown that where the true model is unknown, but is expected 
to be quite complex, a random design over a multidimensional domain limited 
by the controlling physical restraints on the input variables tends to have better 
balance with respect to the true model, once it is discovered, than a design that 
is balanced with respect to the original z-input variables. (Exceptions to this 
statement are numerous.) 

Before closing the discussion of random designs, it is well to correlate random 
designs with classical balanced designs. To make the discussion concrete, let 
us return to the design, 


Zy Ls 


60 15 
70 20 
70 15 
60 20 


xX’ = 


where x, and x, have exact balance and z; , the order of conducting the experi- 
ments, has random balance with respect to the exact balance design of x, and 2, . 
From the classical viewpoint, this is a full factorial exact balance experiment 
in x, and x, planned according to the best recommended practice. 

The variable, x; , order of conducting the experiment, is a variable influencing 
the individual output results. It is a variable not included in the exact balance 
section of the design for a very simple reason. One and only one experiment is 
possible for each value of x; . Therefore we cannot run every combination of 
x, and 2 for each value of x; . We cannot balance 2, and x, with respect to z; . 
Moreover we seldom have any useful information regarding the functional 
relationship between x; and the output variables. Without a functional relation- 
ship model, balance is meaningless. 

The factor that distinguishes modern statistical design is the randomization 
technique used to handle variables such as z, , the order of conducting the 
experiments. Formal randomization of x; with respect to the design variables 
creates random balance that mathematically assures that the effects of 2, , 
whatever their nature, will influence our interpretation of the effects of the 
design variables in n unbiased manner. If such variables, like z, , are not ran- 


FS eee Op ae 
FS HR OSS ren Be 


eetee are ae 
ERD IE RIES 
a et ee Tee EOS Oe oe “hy Be Pa aan Ss 





116 F. E. SATTERTHWAITE 


domized with respect to the design variables, it is only an opinion that their 
effects do not bias the interpretation of the design variables. It is beyond the 
power of statistical mathematics to check such opinion judgments. 

Thus we see that classical statistical designs are in fact mixed balance— 
random balance designs when we formally randomize the variables whose effects 
are assigned to the e-residuals in the model: 


Y = Ay + A,X, + On%2 + eee +e. 


It should be strongly emphasized that the difference between classical designs 
and random balance designs is only a difference in degree, not a difference in 
kind. Classical designs make possible a rigorous analysis and interpretation 
only because the residual variables are random balance variables. Random 
balance designs differ only in the fact that, in general, the number of variables 
introduced with random balance (instead of exact or partial balance) tends to 
be larger than has been customary in past practice with classical designs. 

Because random balance designs differ only in degree, not in kind, from classical 
designs, all problems in connection with devising analysis procedures that are 
appropriate, valid, and rigorous are problems that are common to both classical 
designs and random balance designs. If the mathematical solution of the problem 
is valid for classical designs, it is valid for random balance designs. If no rigorous 
mathematical solution is available for some aspect of analyzing random balance 
designs, neither is one available for the same aspect of analyzing classical designs. 

It is natural that misunderstandings have arisen as a result of failures to 
recognize the essential mathematical identity of classical and random designs. 
Random designs tend to be applied to investigations that are more complex 
than the investigations for which classical designs are used. Specifically, classical 
designs are applied to (and in practice usually limited to) investigations involving 
not more than five to ten design variables. On the other hand, random balance 
designs are seldom used (and are in general inefficient) with so few design vari- 
ables. The current medium number of variables in random balance investigations 
appears to be between 10 and 15. The writer does not know of any theoretical 
limit on the number of variables than can be investigated simultaneously by 
random balance designs. The practical limits appear to be controlled by economic 
and administrative considerations. As experience accumulates, these economic 
and administrative limits tend to be pushed farther and farther away. 

Because the application areas for classical designs and random designs hardly 
overlap, the differences, which mathematically are only differences in degree, 
do in practice often become so large that for practical purposes they take on the 
characteristics of differences in kind. Mathematical questions which may be 
important in investigations for which classical designs are used may be reduced 
to minor importance for applications where random balance designs are used. 
‘The fact that such questions appear to be ignored by users of random balance 
does not necessarily imply that such users are naive or ignorant or that they 
are taking undue risks. It generally implies that they are good managers in the 
sense that they concentrate time and effort on the important questions whose 
answers give the largest return for the investment. 





RANDOM BALANCE EXPERIMENTATION 117 


We should now point out that important complex major investigations usually 
use mixed exact and random balance designs when conducted by technically 
trained personnel. The random design may have imbedded within it full factorial 
designs, Latin square designs, and fractional factorial designs, and also unbalanced 
variables that require analysis by multiple regression methods. A most gratifying 
(and unexpected) result observed on the introduction of random balance methods 
to an organization is the sudden demand, incentive, and interest created for 
knowledge and competence in classical statistical methods and theory. Experience 
shows that random balance is not a substitute for nor a competitor of classical 
methods. Introduction of random balance usually increases the use of classical 
designs. Do not forget that classical exact balance designs are intuitively desirable 
designs for investigations involving a small number of variables to be analyzed 
with respect to a simple model. Small investigations and simple models will 
always outnumber large investigations and complex models. 

The differences in the usual application domains of classical and random 
designs cause several questions to become important which were of minor 
importance heretofore: 

(a) Assumptions regarding the residual elements such as independence, 
variance homogeniety, and normal distribution. In complex investigations 
one or more of these assumptions may fail to hold to a degree that makes invalid 
analyses dependent on these assumptions. 

(b) Assumptions that the functional form of the model is known and is simple 
including the assumption that higher order interactions are unimportant. As 
an investigation becomes sufficiently complex, it becomes increasingly certain 
that the true model may contain surprises, that it may involve non-linear 
relationships (true models are often non-linear solutions to differential equations), 
that some dominant variables in the model may be complex functions of the 
input design variables, and that some important high order interactions may 
show up. In such cases the initial design and analyses are dominated by the 
purpose of discovering the appropriate model. 

(c) Extrapolation. In major complex investigations it may be necessary to 
extrapolate conclusions beyond the range of input variable values available for 
current experimentation. Such extrapolations have neither technical nor sta- 
tistical validity unless made in terms of a true physical model. This makes 
important the use of designs that uncover hints of possible physical models and 
designs that sharply discriminate between alternative possible physical models. 
It also may mean very precise evaluation of key unknowns in the model. 

(d) Significance levels. Current texts on experimental design consider primarily 
the 5 percent and 1 percent significance levels. In important complex investiga- 
tions these significance levels may be used only for minor decisions such as the 
planning of additional experimentation. Important final decisions may require 
much smaller risks. For example, in reliability work routine decisions should be 
made with risks of the order of 10-° to 10~"° and the number of such decisions 
may be in the thousands. 

(e) Multiple-test bias. In complex investigations the number of significance 
tests calculated may be quite large. If, for example, 10,000 significance tests are 





118 F. E. SATTERTHWAITE 


calculated (without correction for multiple-test bias), then we expect that 100 
of these tests will indicate significance at the 1 percent level even if all true 
effects are zero. 

(f) Pooling bias. If, in choosing an estimate of residual error, we use signifi- 
cance tests for decisions as to what effects will be left in the residual and what 
effects will be removed, our estimate (if uncorrected for pooling bias) of residual 
error will be biased to an unknown degree. Unfortunately this pooling bias is 
usually in a direction that is not conservative. The opportunities to introduce 
pooling bias into the analysis increase so rapidly with the complexity of the 
investigation that conservative pooling bias corrections become a serious problem. 

(g) Calculation considerations. The volume of calculations required for a 
complete analysis by classical multiple regression and analysis of variance 
routines generally increases rapidly as the complexity of the investigation and 
the complexity of the possible models increases. Thus an investigation need 
become only mildly complex before calculation considerations may dominate 
the choice of the principles used to design the experimental program. 

(h) Number of experiments. The number of experiments required for a full 
factorial design increases exponentially with the number of variables in the 
investigation and rapidly becomes impractical. Classical exact balance designs 
offer the highly fractionated factorial as the only solution. Such designs are 
often unacceptable to the person responsible for the investigation for one or 
more of the following reasons: 


(i) The principles required to lay out the design for the required number 
of levels of each variable are not available 
(ii) The number of experiments required may still be too large. 


(iii) The design produces 100 percent confounding of large numbers of possible 
effects and makes certain that the data will contain zero information 
to resolve such confoundings. 


Random balance designs provide a competitor to highly fractionated factorial 
designs as a means for reducing the number of experiments required to a reason- 
able figure. Investigators experienced in both methods tend to use random 
balance in preference to highly fractionated factorials. Whether they are right 
or wrong in so doing is often not pertinent. The person paying for the investi- 
gation and responsible for the conclusions is absolute boss and his desires are 
controlling. He cannot be forced to pay for data and to accept conclusions as 
valid when he lacks confidence in the principles used to collect the data. 

We now want to re-emphasize that the above problems exist and are mathe- 
matically identical for classical and random designs. Their relative importance 
is a function only of the complexity of the investigation. Consider, for example, 
the multiple test bias for a balanced full factorial design in 30 variables at three 
levels each with every possible combination duplicated. That is, an experimental 
program with 

N = (2)(3°°) = 410,000,000 ,000,000 


experiments. If we test for linear and quadratic main effects and for linear first, 





RANDOM BALANCE EXPERIMENTATION 
second, and third order interactions, then the number of tests will be 


k = 31,961. 


As an alternative, consider a pure random balance experiment over the same 
domain with 39,000 experiments plus 1000 random exact replications (i.e., a 
total of 40,000 experiments). The multiple test bias will be identical for both 
investigations. In both cases we will expect 320 tests to indicate significance at 
the 1 percent level even when all true effects are zero. 

In remaining parts of this report we shall discuss in detail some of these 
problems that are often important in complex investigations. Unfortunately 
we shall not discuss al! of them. Sufficient experience with complex investigations 
(by random balance or any other method) has not accumulated to bring some 
of these problems into sufficient perspective. Until such experience accumulates, 
proper definitions of such problems, useful discussions of such problems, and 
rigorous mathematical treatment of such problems will be essentially impossible. 

Finally in closing Part I on Random Balance Designs, we shall list those 
advantages of random design which, at the time of this writing, appear to be 
most important as causes for its popularity with many classes of experimentors. 
This list is, of course, subjective and time will cause many corrections. Also 
remember that an advantage in one type of application is often a disadvantage 
in another type. This list of advantages does create a serious risk, a risk that the 
untrained or inexperienced investigator will let a favorite advantage blind 
him to a serious disadvantage. Random designs may then be used in situations 
and ways that are inappropriate. With these qualifications, the advantages of 
random balance designs (combined, when appropriate, with classical designs) 
are listed: 

(a) Design simplicity. Almost anyone can be easily trained to correctly lay 
out a pure random design for any desired experimental domain. Thus the portion 
of investigations for which expert statistical guidance is desirable at the design 
stage may be substantially reduced. 

(b) Analysis simplicity. Almost anyone can be easily taught the simpler 
methods (such as pick-the-winner) for valid analysis of pure random balance 
data. If the purpose of the investigation is one for which these simple methods 
are appropriate, the need for expert statistical guidance is eliminated from the 
analysis and interpretation stages of the investigation. 

(c) Graphical analysis. Random balance data containing wane a few significant 
effects can be analyzed graphically by one-variable scatter plots and two variable 
contour diagrams. Random balance assures that such one and two variable 
plots are unbiased (except for multiple test bias and, occasionally, pooling bias) 
regardless of the number of variables acting in the data. If replots of residuals 
are made after removal of clearly significant effects, graphical analysis may also 
be quite efficient. Technical and management personnel have had extensive 
experience with scatter diagram analysis. If competent, they have developed 
good judgment regarding the risks involved in interpreting such plots and the 
confidence that may be attached to conclusions reached. The true importance 
of graphical analysis, however, is due to the fact that a correct answer has no 





120 F. E. SATTERTHWAITE 


economic value unless those who make the decisions regarding its use have 
sufficient confidence to use it. Most managers, and many technical personnel, 
tend to lack confidence in numbers based on involved algebra and extended 
calculations. For real confidence they want essentially raw data clearly presented 
graphically. The key to interpretation of data is how the mean line is drawn on 
the scatter plot. The conservative decision maker will not delegate this step, 
when critical, to a clerk or a non-thinking electronic computor. 

(d) Feedback and sequentialization. It is not difficult to revise a random balance 
design during the course of the investigation as preliminary analyses of the early 
experiments indicate opportunities to concentrate later experiments more 
closely in the domain of major interest. (Classical designs are generally difficult 
to revise.) It is true that such revisions cause unbalance between specific pairs of 
variables but the number of such unbalances is generally small and they are 
readily handled by multiple regression or other appropriate analysis techniques. 
The fact that any random part of a random balance design is itself random 
prevents any difficulties in performing the preliminary analyses required for 
feedback revisions. For complex investigations this ease of sequentialization 
may become of dominate importance. Often the experimental domain of major 
interest, once it is located, may be a very small fraction of the domain explored 
by the initial experiments. (For example, if the domain for each of four variables 
is reduced to one-fourth of the original domain, the total domain of investigation 
is reduced by a factor of 256.) The final interest domain may not even overlap 
the original domain. A non-sequential experimental program spread over the 
original domain is then extremely inefficient. 

(e) Number of experiments. The number of experiments necessary to serve 
the purposes of an investigation appears, in general, to be dominated by the 
type of questions asked of the data, the precision required of the answers, and 
the amount of residual variation for which causes cannot be identified and effects 
evaluated. The effect of the number of variables investigated on the required 
number of experiments appears, in general, to quickly reach a saturation. In 
fact, the introduction of additional design variables often reduces the required 
number of experiments by making possible the identification of additional 
causes of, and the removal of additional effects from, the residual variation. 
Since random designs may be easily prepared for any number of experiments 
(greater than one in theory and greater than 20 in practice) and since this 
number may be freely revised by feedback during the course of the investigation, 
random balance designs are widely used because the very important factor, 
number of experiments, is under engineering control. It is not subject to the 
severe mathematical restraints often existing for classical design procedures. 

(f) Efficiency. No general statements can be made regarding the economic 
efficiency of an experimental design procedure. The economic efficiency is a 
function of the physical, cost, administrative, time, facility, and technical talent 
restraints controlling the specific investigation. These occur in practice with all 
possible combinations of extremes. The following table gives what are believed 
to be realistic statistical efficiencies for many applications of pure random balance 
experiments when compared with exact or near exact balance experiments: 





RANDOM BALANCE EXPERIMENTATION 


Number of Lower Limits on Efficiency 
Data Sets of Pure Random Balance 


10 percent 


The table shows that random balance economic efficiencies will usually be satis- 
factory if more than 20 experiments are justified and appropriate credit is given 
for other advantages of random balance. 

Later parts of the report will discuss in detail some, but not all, of the suggested 
advantages of random balance designs. 

Part I, Random Designs, has considered only the definitions and characteristics 
of random and random balance designs independent of the methods that may be 
used for analysis of the resulting data. Part I assumes that an analysis procedure 
is available that is appropriate to the design and model and is sufficiently efficient, 
accurate (unbiased), precise, understandable, convincing, and inexpensive for 
the purposes of the investigation. Later parts of this report will present and 
discuss some of the analysis principles that may be and are being used to analyze 
and interpret random balance data. We stress the point that the reader should 
try to separate in his thinking his questions regarding random design from his 
questions regarding the analysis of random data. There is in practice so little 
connection between the problems of design and the problems of analysis that 
confusion is likely to occur if one allows oneself to think about both types of 
problems simultaneously. 

Among some users of random design there is developing a trend to deemphasize 
analysis problems during the planning and experimentation periods. Such users 
stress the principle that first importance be given to assuring that all information 
which may be pertinent to the investigation will in fact exist in the data. If the 
required information does in fact exist in the data, it is seldom difficult to extract 
that information by appropriate informal or formal analysis methods. 


Part II. RAaNpomM ANALYSIS WITH CLASSICAL ASSUMPTIONS 


In Part I, Random Design, we defined several types of random designs for 
experimental programs, we listed some of the problems created by random 
designs, and we listed some of the advantages motivating the use of random 
designs. In this Part II, Random Analysis with Classical Assumptions, we 
shall discuss the basic methods available for analysis of data collected by random 


designs. 
ASSUMPTIONS 


The discussions in Part II will be restricted to situations for which the usual 
assumptions of classical statistical analysis may be assumed to hold. The reader 





122 F. E. SATTERTHWAITE 


will require mental discipline in reading this Part II. For many applications of 
random design the classical assumptions may be, and often are, quite unrealistic. 
Thus the use of the methods in Part II without consideration of the assumptions, 
without supplementary analyses to check the assumptions, and without modifica- 
tions to correct for unjustified assumptions may, and often will, lead to invalid 
conclusions. Quite often assumption considerations dominate the methods of 
analysis used. Later parts of this report will discuss the assumptions in detail 
and the modifications desirable because of assumption considerations. 

Specifically, the assumptions that will be assumed to hold in Part II, Random 
Analysis with Classical Assumptions, are the following: 

(a) Residual variation. The total effect of all variables assigned to residual 
variation in the model, 


Y = A + AX, + Ook, + +++ +e. 
leads to residual e-elements that are: 


(i) Statistically independent of each other (except for restrictions whose only 
mathematical effect is to modify the degrees of freedom). 

(ii) Homogeneous; that is, each e-residual is a random sample from the same 
statistical distribution. Specifically, the e-residuals are random samples 
from distributions with homogeneous variance (i.e., the e-residuals are 
assumed to all have the same variance). 

(iii) Unbiased, that is, the mean of the true distribution for each e-residual 
is zero. 

(iv) Statistically independent of every design x-variable in the model. In practice 
most analysis methods require only that the e-residuals have zero expected 
correlation with every x-variable. 

(v) Normally distributed; that is, each e-residual is a random sample from a 
normal distribution, 


f(2) de = (20°) exp (—e’/2c”) de 


(b) Interpretation independence. In complex investigations there are usually 
a large number of interpretations based on the same experimental data. Part II 
assumes that each interpretation is independent of every other interpretation 
made in the investigation. Thus in an investigation involving temperature and 
pressure as input x-variables, the interpretation of the effect of pressure is not in- 
fluenced by the specific evaluation of the temperature effect obtained. In other 
words, we shall assume in this Part II that it is valid and satisfactory to com- 
pletely ignore all consideration of multiple-test bias and multiple-interpretation 
bias. The reader is again warned that in complex investigations, by either 
classical designs or random designs, multiple-test bias is often strongly and 
inherently present in the desired interpretations. A statistical analysis which 
ignores multiple-test bias in these situations may be, and usually will be, seriously 
incorrect, invalid, and misleading. The analysis methods in Part II will then be 
insufficient. (Note: Do not confuse the interpretation independence here con- 
sidered with statistical independence of parameter estimates used in different 


i it ee ee ee ke Oe ek Ok eee eek eek Ree Ge eee ae ee a 





RANDOM BALANCE EXPERIMENTATION 123 


interpretations. Such parameter estimates and the interpretations based on them 
may be, and often will be, statistically correlated.) 
(c) Pooling Bias. For the analysis of complex data we may assume a model, 


Y = Gd + 2, + .%, + +++ +28, 


as a basis for the analysis and the associated mathematics. In Part II we shall 
assume that, for each desired interpretation (see (b) above), a specific model 
and a specific method of evaluating that model is chosen a priori. In other words, 
we assume that the model and analysis choice for any interpretation is unin- 
fluenced by any other evaluation, analysis, or interpretation based on the 
same data. Relaxation of this assumption creates a possibility, and usually a 
probability, that a pooling bias will be introduced into the calculated significance 
levels and confidence risks. Moreover, there is a tendency for the effects of such 
pooling bias to be in a non-conservative direction. As an investigation becomes 
more complex, the opportunities to increase the efficiency of both the experi- 
mental design and the data analysis by hindsight choices of models and analysis 
methods tend to become so numerous and important that pooling bias free 
methods may become impractical. In such situations the analysis methods 
discussed in Part II must, in general, be modified to correct the interpretations 
for pooling bias. (Note 1. The usual degrees of freedom correction does not 
correct for pooling bias. It corrects only for correlations introduced by restric- 
tions. An additional pooling bias correction is necessary. Note 2. Nothing in 
this discussion (c) implies that the same model and the same analysis method 
must be used for all the interpretations based on the same data. Indeed (b) 
above specifically implies that different models and analyses may be used for 
each interpretation.) 


CONFIDENCE LIMITS AND SIGNIFICANCE LEVELS 


We should, at this time, point out a property of significance levels and con- 
fidence limits that is often overlooked with resulting confusion. Significance 
levels, confidence limits, and confidence risks are not unique. This non-uniqueness 
in no way detracts from their validity. Firstly, significance levels and confidence 
limits are a function of the data from which they are calculated. Thus two men 
may each take their own data and calculate appropriate confidence limits, 
These two sets of confidence limits are both 100 percent valid though they are in 
fact numerically different. Secondly, even with the same data, confidence limits 
are not unique. For example, with normal data both the arithmetic average 
and the medium are unbiased estimates of the true population mean. But 
confidence limits based on the medium will be numerically different from limits 
based on the arithmetic average. Both sets of confidence limits are, however, 
equally valid. In this example the average happens to be the most efficient 
estimate of the population mean possible. But the medium and its confidence 
limits, while less efficient, are just as valid. Moreover, it is well known that the 
true population distribution has to be only slightly non-normal (in the right 
direction) for the medium to be a better estimate of the population mean than 
the average. 





124 F. E. SATTERTHWAITE 


The point we are here making is that for a specific investigation any number of 
reasons may occur which cause the use of non-standard or inefficient analysis 
methods to calculate parameter estimates, significance levels, confidence limits, 
and risk coefficients. Assuming the analysis method is appropriate and is correctly 
carried out, the results of the analysis are valid even though the numerical 
values obtained differ from and may be less efficient (in the statistical sense) 
than standard analysis results. 

This point is important in connection with random balance since many 
random balance applications are made in situations where statistical efficiency 
is of minor importance. The result has been a proliferation of informal, approxi- 
mate, and inefficient analyses procedures in actual practice. These must not be 
condemned outside of the context of the specific application to which they were 
applied. Engineering is concerned with economic efficiency and this can be 
judged only with respect to the total economic aspects of the specific application. 

Definition of Confidence Limits. Before discussing specific analysis techniques, 
we should give a rigorous statement of the definition of confidence limits. The 
suggested analysis techniques are to be judged only with respect to the fact 
that they do, or do not, produce unbiased estimates with valid confidence limits. 
Unless otherwise stated, no claim is made that the techniques discussed are 
efficient in the statistical sense. 

Throughout this report, the definition of confidence limits are the associated 
risk level will be specific and as follows: 

Confidence Limits for a parameter with true value, a, and for a risk level, a, are 
numerical limits, a + L, calculated from the data associated with an investiga- 
tion that have the following properties: 


(i) A hypothetical population of repetitions of the investigation is defined. 

(ii) The routine for calculation of the confidence limits (intervals) is specif- 
ically defined. 

(iii) Routine (ii) applied to each member of the hypothetical investigation 
population, (i), generates a hypothetical population of confidence 
intervals. 

(iv) Statistical distributions are assumed and sampling rules specified for 
various conditions of the investigation, for various elements of the 
mathematical model and for various steps in the calculation routine. 

(v) Each hypothetical repetition of the investigation, (i), and each hypothet- 
ical confidence limit calculation repetition, (ii), involves new selections 
of all statistical values in accordance with the distributions and sampling 
rules in (iv). 

(vi) The generated hypothetical population of confidence intervals is such 
that a portion of the total number of such hypothetical intervals equal 
to the risk level, a, fails to include the true population parameter, a. 

(vii) This result, (vi), is independent of the unknown true a-value estimated 
and independent of the true values of other parameters assumed to be 
fixed but unknown. 


The definition of confidence limits (confidence intervals) as here stated is 





RANDOM BALANCE EXPERIMENTATION 125 


deliberately more general than that usually given in textbooks. It is this wider 
generality of the definition that creates the opportunity for random experimental 
designs and other new principles of polyvariable analysis. The key generalization 
is item (iv) of the definition: 


(iv) Statistical distributions are assumed and sampling rules specified for 
various conditions of the investigation, for various elements of the mathe- 
matical model, and for various steps in the calculation routine. 


In classical statistical techniques statistical distributions are placed only on 
the residual e-elements in the model. The repeated investigations and associated 
calculations are identical except as influenced by new samplings of these residual 
e-elements. Our definition allows more freedom. It allows statistical distributions 
on the z-elements in the model. This leads to random designs. Each repeated 
investigation then has a new X’-design matrix based on a new random selection 
of z-elements. Our definition allows statistical distributions on the true values 
of the unknown a-parameters in the model. This leads to other new principles 
of statistical analysis, specifically the polygression technique. 

This report is confined to random designs and their analysis. Thus we shall 
be concerned only with statistical distributions and random samplings associated 
with e-residual elements in the model and x-elements in the X’-design matrix. 
The hypothetical repeated investigations called for in the definition of confidence 
limits will be identical to each other except as influenced by new random samplings 
of the e and 2-elements. 

Significance levels will be defined in terms of confidence limits. The significance 
level, a, associated with significance test for a parameter, a, is numerically equal 
to the a-risk level associated with a confidence limit calculation that restricts 
the numerical values of the appropriate confidence limit to a prechosen value, 
4 , of the parameter, a. That is, the significance level, a, is the a-risk level for an 
appropriate confidence interval that just touches the test value, a) . (Note: If 
the test value a» is not specified, a test value of a, = 0 is generally implied.) 
The significance level is a measure of the incompatibility of the data with an 
assumption that the true value of the unknown parameter, a, is in fact dp» . 
Our definition has the mathematical advantage that a significance level is a 
special case of a confidence limit risk level and therefore has mathematical 
characteristics identical to those for risk levels. 

We should note in passing that requirement (vii), independence with respect 
to the true values of all fixed but unknown parameters, is a strong restriction. 
This restriction, (vii), may prevent the existence of confidence limits. 


GENERAL PRINCIPLES OF RANDOM BALANCE ANALYSIS 


Experience has shown no need to revise the original statements of the general 
principles governing random balance analysis. Quoting from the Proceedings of 
the Rutgers Quality Control Conference, September 1956 (American Society 
for Quality Control, Milwaukee, Wisconsin), we find: 


TEND GOS OOS TES he Ne eo te ea 





F. E. SATTERTHWAITE 


“The analysis of random balance designs depends on the fact that 
the effect of each variable is random with respect to the effects of all 
other variables. Therefore any desired analysis can be made on any 
(sufficiently small) set of variables ignoring all the others. The ignored 
variables are valid residual error for the analysis being made.” 


The writer recently (1958) discovered an internal communication written 
while with the General Electric Company (about 1950) on random balance 
which apparently had no effect on the writer or anyone else. From it we have 
the quotation (parentheses are modifications because of context): 


“The analysis of results is then made in the regular manner for each 
(subset of variables) without consideration of the other variables. 
The effect of this superimposed randomization (i.e., random balance) 
is to throw into the error component of variance the total (effects) 
of all variables not included in the (subset) being analyzed. Therefore 
every (subset) should include all variables (whose effects) are expected 
to be large compared to the expected error variance. --- If a variable 
expected to (have a small effect) turns out to (have a large effect), 
it may seriously inflate the error term of many of the (subsets). In 
such cases, the effect of this inflation may be reduced by making an 
analysis of co-variance with respect to that variable. In any case, the 
discovery of such unexpectedly large (effects) is usually one of the 
chief purposes of the experiment and any difficulties they introduce is 
the price we pay for any previous ignorance of them.” 


The only deficiency in either statement is a lack of reference to the pooling bias 
that will be introduced if the data being analyzed are allowed to influence the 
choice of variables for subsets (or the choice of co-variance variables). In Part 
II, however, we assume that such choices are a priori or that the analyst is 
willing to accept the unevaluated risk of pooling bias inaccuracy. 

There is currently (1959) a misunderstanding regarding random balance 
analysis. This is an assumption that random balance implies a specific technique 
of analysis. Users of random balance are continually asked ‘How do you analyze 
random balance data?’ The trouble is that this question implies an answer 
which is contrary to fact. It implies the existence of a specific and unique random 
balance analysis technique. The original statements above, however, deny that 
there is a specific unique technique. They state that any analysis technique may 
be applied to any subset of variables ignoring all other variables. The techniques 
used to analyze random balance data include practically all the techniques 
used to analyze data without random balance properties. If hindsight decisions 
are allowed to introduce pooling bias, the same techniques may be used to 
adjust for pooling bias as are used in the analysis of any other data. 

Among users of random balance there is a diversity of analysis techniques. 
The tendency is to carry over techniques from past experience. Thus those with 
little statistical experience may prefer informal graphical analysis; those with 





RANDOM BALANCE& EXPERIMENTATION 127 


quality control experience may use edntrol chart, multi-vari chart, and other 
quality control methods; those with- introductory statistical training may 
calculate ‘‘?’”’ or “F’’ tests; those using desk calculators may prefer analysis of 
variance methods (they therefore tend to imbed exact balance designs in the 
random balance design); those with access to electronic computers, on the other 
hand, tend to use multiple regression analysis almost exclusively (since inversion 
of matrices is not a problem and analysis of variance programming is often 
quite difficult). 

The introduction of random balance has, of course, caused the invention and 
development of new analysis techniques that are particularly useful with (or 
are even restricted to) random designs, at least for specialized applications. 
The “pick-the-winner” analysis is an outstanding example. Another influence 
on analysis methods is the fact that random balance is often used in applications 
where efficiency in the statistical sense is of minor importance. Such applications 
tend to develop and use analysis methods that are quite informal and inefficient 
by the standards of classical statistics. Sometimes such applications tolerate 
analysis methods that are inaccurate and biased. 

The analysis of random balance data is further confused by the fact that 
random balance may be the first substantial contact of the user with statistical 
methods. This contact may be as slight as a one hour lecture or a two page 
article in a trade journal. “A little knowledge is a dangerous thing.” Such users, 
while learning by experience, can and do commit the sins that have plagued 
statistical methods from the beginning. “Experience is a hard teacher’’, though 
often an effective and necessary one. 

With this long introduction we shall now define and discuss several specific 
analysis methods with particular reference to their use with random balance 
data. Remember that in this Part II we assume that: 


(a) The usual classical assumptions apply to the residuals. 
(b) Pooling bias, if present, may be ignored. 
(c) Multiple test bias, if present, may be ignored. 


PicK-THE-WINNER ANALYSIS 


The pick-the-winner technique for data interpretation is very old. Its formaliza- 
tion as a valid technique of statistical analysis is, however, quite new. (Doctoral 
thesis by Samuel Brooks under the direction of W. G. Cochran, Johns Hopkins 
University, 1953.) 

The formal pick-the-winner analysis technique is applicable only to data 
collected by a pure random design. (If the design has random balance, this is 
only incidental.) A domain of investigation is defined. The experimental points 
are a pure random sample of all possible combinations of input variable values 
in this investigation domain. The purpose of the investigation is to optimize 
the output variable (or some function of several output variables). The analysis 
has two steps: 





128 F. E. SATTERTHWAITE 


(a) From the list of output variable values obtained, identify that value 
which is most optimum. 

(b) Accept the input variable combination associated with this most optimum 
output as a near optimum combination of input variables. 


The usual confidence statement calculation for a pick-the-winner analysis 
assumes that duplicate experiments will reproduce output variable values 
exactly; that is, the experimental error variance is zero. The confidence state- 
ment is obtained directly from an appropriate tabulation of the binomial distri- 
bution. The form and meaning of the confidence statement can be best commu- 
nicated in terms of the example indicated on Table IIa. 


With 43 experiments, we are 90 percent confident that the input 
variable combination associated with the observed “best” output result 
is in the “best’’ 5 percent of the input variable domain investigated. 


TABLE Ila 
Pick-the-Winner Anaiysis 


Relative Confidence Level 
Size of oom 

Optimum 80 percent \90| percent 95 percent 99 percent 
Region Number of Data Sets 


10 percent 29 


10 22 
B [43] 59 
91 


“2.5 
1 230 299 
0.5 460 


Such confidence statements are valid subject only to the single assumption 
that the reproducibility errors among duplicate experiments are zero. The 
other inputs to the calculation and its theory are: 


(a) The number of experiments. 
(b) The random selection of the experimental points. 


These are known facts, not assumptions. Note that the confidence statement 
is in no way dependent on the number of variables in the investigation. A 
pick-the-winner analysis is valid for one variable or for a thousand variables. 
Also the confidence statement is in no way dependent on the true model expressing 
the functional relationship of output variables to input variables. It is valid for 
any true model whatever. 

The writer knows of no attempt to date to develop an efficient and valid pick- 
the-winner confidence statement calculation applicable when the reproducibility 
error variance is not zero. We do know that the standard procedure is valid for 





RANDOM BALANCE EXPERIMENTATION 


i. 


reproducibility variance _ 0 
total variance 


We also know that the standard procedure is not conservative for f larger than 
zero and that as f approaches one the true confidence coefficient becomes equal 
to the optimum domain relative size. (The winner point is random.) 

On the other hand, it may be inherent in most optimization investigations 
that the importance of locating a true optimum combination of input variable 
values decreases rapidly as the reproducibility error variance increases. Thus 
it may be quite satisfactory to broaden the optimum region defined by relative 
size to include all true output values departing from a boundary value by less 
than an appropriate fraction of the error variation. The standard confidence 
statement could then be used regardless of the true reproducibility error. 

Some such change in the definition of the optimum region is desirable. Note 
the confidence level discontinuity at f = 1: 


f—1; Confidence level — relative size 
=1; = 100 percent 


(This 100 percent confidence results for f = 1 because all points in the investi- 
gation domain are then optimum.) An appropriately defined optimum domain 
should have the confidence level approach 100 percent as f approaches one. 

The pick-the-winner method as defined above is only one application of the 
pick-the-winner principle of statistical analysis. Methods based on this principle 
are non-parametric and have both the advantages and disadvantages of other 
non-parametric methods of analysis. The advantages are usually related to 
simplicity and assumption freedom. The disadvantages are usually related to 
inefficiency and failure to evaluate parameters in an assumed model. 

The opportunities to develop special pick-the-winner methods for specific 
investigative needs are numerous. Some of the factors creating these opportunities 
are: 

(a) A few exact duplicate experiments may be, and often are, run to evaluate 
the reproducibility error. (Note: If the experiments to be duplicated are not 
selected at random, the error estimate may be, and usually will be, biased. 

(b) If the reproducibility error is not zero, the winner selection involves 
what is essentially a multiple test bias. This bias will be reduced if the winner 
or near winner tests are duplicated. 

(c) The efficiency of pick-the-winner will be increased if the random selection 
of experimental points is from a non-uniform distribution over the investigative 
domain (provided, of course, that the distribution used does in fact concentrate 
experimentation in the more promising regions of the domain.) 

(d) Feedback (sequentialization) is an effective and safe method to guide 
distribution modifications in (c). Such feedback leads to random evolutionary 
operation techniques in an unlimited number of variables. 

(e) If multiple winners (e.g., the best 5 percent of the experiments) are con- 
sidered, the domain covered by these winners defines approximately the optimum 
region with the same relative size. 





130 A F. E. SATTERTHWAITE 


(f) Appropriate combinations of (d), (d), and (e) give efficient techniques 
for simultaneous optimization of the many tolerances associated with a complex 
process, product, or system. 

Regarding the efficiency of pick-the-winner methods, it must first be recog- 
nized that they have the inefficiencies that are inherent in random designs and 
inherent for any non-parametric analysis. On the other hand, it should be 
remembered that non-random designs and parametric analyses gain efficiency 
only because a meaningful model is assumed a priori. If the model used is in 
fact the approximate true relationship, efficiency is improved. If, however, the 
model is inappropriate, efficiency is lost. In extreme cases, an inappropriate 
model may lead to zero (or negative !) efficiency in the sense that the inter- 
pretations arrived at are just plain wrong. One is then worse off after the investi- 
gation than he was before the investigation. 

Brooks in his thesis investigated by Monte Carlo studies the efficiency of 
pick-the-winner in relation to other optimum seeding methods (steepest ascent, 
analysis of variance, etc.). His examples assumed a known true model of simple’ 
form so that model error and multiple maximum inefficiencies did not operate 
against the standard methods. Also his examples involved only two variables so 
that exact balance designs did exist for the standard methods. (That is, the 
random design inefficiency was acting at a maximum level against pick-the- 
winner.) Moreover pick-the-winner was not sequentialized, which makes the 
comparison with sequentialized steepest ascent quite unfair. In his studies 32 
experiments were run for each investigation. 

The relative inefficiencies of pick-the-winner, as evaluated by Brooks, were 
quite minor. Pick-the-winner was seldom more than 10 percent inefficient 
relative to the best of the other methods studied. Brooks’ inefficiencies are well 
within the maximum inefficiency to be expected from use of a pure random 
design; that is, 30 percent inefficiency for 32 experiments (see Part I). In other 
words, the pick-the-winner method probably cancelled some of the inefficiencies 
existing for the standard optimum seeking methods. It appears that pick-the- 
winner would have shown relative efficiencies well over 100 percent if the examples 
used had not been so favorable to the standard methods. Specifically, pick-the- 
winner would have had higher efficiencies if more variables were investigated, 
if the true model was a non-simple function, and if a sequential pick-the-winner 
design was used. 


MULTIPLE REGRESSION ANALYSIS 


In classical statistical analysis, multiple regression is the standard analysis 
method when a linear model, 


Y = Ay + a2, + at, + ++: +2, 


may be assumed to represent the data. Multiple regression is a 100 percent 
efficient analysis for such models. Analysis of variance and other specialized 
calculation routines are special cases of multiple regression and therefore need 
not be considered in a general discussion. Multiple regression is the calculation 





Fro CY 


















RANDOM BALANCE EXPERIMENTATION 131 


routine usually used when electronic computers are available even for data 
appropriate for analysis of variance calculations. 

These statements regarding multiple regression hold for analysis of random 
design data also. If a linear model, 


Y = A) + 7, + Oot. +--+ te 
is assumed, multiple regression is the standard analysis method. The calculation 
routines and the interpretations are in no way different than they would be for 
any other data. (Remember, of course, that in this Part II we ignore multiple 
test and pooling bias.) 
Let us assume a split-model, 
Y = Ay, + Ay + °° 

+ bz, + bar, + -:: 

+ C121 + Cz. + °° 
+e 
The design matrix for this model is 


wees } 
SR A ARISE A IY 






(U’, x’, Z’, E’) = UjUeg °° * My1%21 °° * 211221 °° * A 











Ujqlag *** Uy2%e2 *** Z2%22 °° * C2 





Uinlen **°* Linton °° * Zinf@an °° * On 










Let us also assume that the z-variable elements and the e-residual elements 
have the necessary types of random balance with respect to the z-variable 
elements. (Note: The random balance requirements for a multiple regression 
analysis are less restrictive than the definition in Part I as we shall shortly 
determine.) We also assume that the number of a and b-unknowns in the model 
is less than the number of data sets. (Specifically, we assume that the matrix, 
(U’, X’)(U’, X’)’, is not singular and therefore has an inverse.) 

For simplicity we shall in the following discussion use matrix notation. The 
model is then 


Y’ = AU’ + BX’ + CZ’ + E’ 
Let us first consider the usual multiple regression estimates for the A-unknowns, 
A = YUU)” 
= (AU’ + BX’ + CZ’ + E) UU)" 
= A+ BXUUU)" + (CZ’ + EUUT)" 


These AU’-effects will be called nuisance effects (variables). These nuisance 
effects exist in the data and are to be removed in the analysis. We are not, 
however, interested in unbiased estimates of the A-coefficients of these effects, 


132 F. E. SATTERTHWAITE 


at least in the current analysis. (The calculated A-estimates must be interpreted 
with extreme caution since no balance requirements are set between the U’ 
and the (Z’, E’) variables.) The constant term, dy , is almost always an AU’-nuis- 
ance effect and therefore a, should be interpreted with caution. 
Let us now remove the AU’-nuisance effects from the Y’-output data results: 
AY’ = Y’ — AU’ 
The AY’-values are the standardized output values which would have been 
obtained if the U’-variables had been held constant (at U’ = 0 values) during 
the experiments. In terms of the other model elements these AY-values are 
AY = Y’ — AU’ 
= Y’ — [YU(U'U)"']U’ 
= Y’I — UU) 'V’) 
= (AU’ + BX’ + CZ’ + E’)\(I — UU) 'V’) 
= A[U’ — UU\UT)'U’) + --- 
= AU’ -—U’)+.--- 
= 0+ (BX’ + CZ’ + E’)(I — UU’) 'U’) 
Let us now define 
AX’ = XI — UU'U) 'U’) 
AZ’ = Z'(1 — UU) 'V’) 
AE’ = E’(I — U(U'U) UV’) 
Then 


AY’ = B(AX’) + C(AZ’) + AE’ 


This AY’-model may now be analyzed by multiple regression to obtain esti- 
mates of the B-unknowns: 


B = (AY’)(AX)(AX’ AX)” 
= B+ (C AZ’ + AE’)(AX)(AX’ AX)” 


The necessary and sufficient condition that these B-estimates be unbiased esti- 
mates of the B-unknowns is that 


&(AZ’)(AX) = 0 
&(AE’)(AX) = 0 


where & indicates the mathematical expectation (expected value, average value). 
The usual requirement that the e-residual elements are random from a popula- 
tion with mean zero is a sufficient condition to assure &(AE’)(AX) = 0. 





RANDOM BALANCE EXPERIMENTATION 133 
A sufficient condition for 6(AZ’)(AX) = 0 is that the Z-variables be of the form 


Z=DU+F 


where the D-coefficients are arbitrary constants and the f-elements are random 
from a population with mean zero. 

Also it is easily shown that B-estimates identical to the above can be obtained 
by the joint multiple regression calculation for the U and X-variables: 


(A, B) = YU’, X”)"[U", XU’, XT" 
Summarizing the foregoing algebra in simpler language gives the following: 


(a) The relationship between the input and output variables is assumed to 
be linear in the unknowns and represented by the split model, 


Y = AU, + Au, + °°> 
+ bya, + bet, + +: 
+ C121 + Co@ + *°* 
+e 


(b) The u-variables are nuisance variables whose effects are to be eliminated 
but not evaluated. The z-variables are design variables whose effects are to be 
both eliminated and evaluated. The z-variables are residual variables whose 
effects are to be neither eliminated nor evaluated. The e-residual elements 
represent the unidentified causes of variation. 

(c) The combined number of u and z-variables must be less than the number 
of data sets and there must be no exact linear relationships among them. 

(d) The b-unknowns may be estimated by the usual multiple regression calcu- 
lation on the u and z-variables combined: 


(A, B) = YU’, x’)’((U’, XU’, X’)'T" 


(e) Sufficient conditions for these 6-estimates to be valid (i.e., unbiased) are 


(i) The z-residual variables may have linear random unbalance with respect 
to the u-nuisance variables and with respect to other z-residual variables. 
That is, each z-varia>!e may be of the form, 


z= du, +du,+-:-:- 
+ ha + hoz. + a 
+f, 


where the d and h coefficients are arbitrary constants with any desired 
values (including zero). 

(ii) A sufficient condition for all z-residual variables to have this relationship, 
(i), with a specific u-nuisance variable is that the u-variable have linear 
random unbalance with respect to the u and z-variables: 





F. E, SATTERTHWAITE 

u= du, + dw, + --- 
+ hit, + hoz. + +> 
+f, 


(iii) The f and e-residual elements in (a), (ei) and (eii) must be zero or be 
random selections from a population with mean zero independent of 
the x-design variables. 

(iv) Subject to (c) and (eii), any arbitrary relationships are allowable among 
the u and x-variables. 


(f) The d-estimates for the nuisance wu-variables incidentally calculated in 
the multiple regression analysis are not, in general, unbiased estimates of the 
a-unknowns in the model. Specifically they are unbiased estimates of 


(A) = A+ C{Z’(U’, X’)’[(U’, XU", X)""} 
where the coefficients of the unevaluated c-residual unknowns are not, in general, 
equal to zero. These @-estimates of nuisance unknowns must, therefore, be 
interpreted with extreme caution. In most statistical analyses the constant term, 
@ , is a nuisance unknown and should therefore always be interpreted with 
caution. 
To make the above analysis specific, consider a simple example: 


(a) Data are available which are assumed to be adequately represented by 
the model 


Y = M + 4,2, + ar, + € 
(b) The e-residuals are asswmed to be independent random samples from an 
appropriate population. 


(c) The, and 2, design variable elements are known to be independent random 
samples from known populations. 

(d) The population means for z, , x, , and e are not necessarily zero and there- 
fore a is a nuisance parameter for which valid unbiased estimates are impossible. 

(e) The model can therefore be rewritten in terms of derivations from the 
mean to remove the a,-constant term effect: 


(y — 9) = a,(@, — #%) + a(x, — %) + @ — 2) 
(f) To estimate the a,-unknown we may split the model as follows: 
(y — 9) = a(x, — 4%) + [a.(x. — %) + e] 
= a(t, — 4 +f, 


The f,-elements in this split model satisfy the necessary requirements for a 
valid regression calculation which gives the unbiased estimate of a, , 


i (x, — z)(y — y) 
; > @ — 2) 


(g) To estimate the a,-unknown coefficient we may split the model differently 





RANDOM BALANCE EXPERIMENTATION 
to give 


(y — y) = a,(z. — 22) + [a,(z, — 2.) +e] 
= A,(%2 — 22) + fe 


j= aa (xt, — 22)(y ws y) 
ba (2 — 22)" 


(h) These simple regression estimates are just as valid as the usual multiple 
regression estimates 


A = (YK)(aX’)" 

This validity of 4, and 4, is, however, dependent on the known random balance 
between x, and x, . The 4, and 4, are not valid estimates for data in general. 

The simple estimates, 4, and 4, , may or may not be as efficient as the multiple 
regression 4-estimates. The variance of the f-residuals will, of course, be larger 
than that of the e-residuals, which tends to make the 4-estimates more efficient. 
On the other hand, the f-residuals have more degrees of freedom and this may 
more than compensate for the increased f-variance. 


Specifically, consider the case when the number of a-unknowns exceeds the 
number of data sets. The multiple regression e-residuals then have 


(n—k) <0 
degrees of freedom and the 4-estimates have the indeterminate form 


- 0,0 
“rere 

The random balance simple regression d-estimates, however, do exist, are valid, 

are associated with n — 2 degrees of freedom, and have finite confidence limits. 

This discussion of efficiency indicates situations for which a random balance 
design is desirable and efficient. These are situations where a standard multiple 
regression analysis becomes inefficient (or impossible) because of loss of residual 
degrees of freedom. In practice a random balance design and a split model 
analysis are desirable, in general, when the residual degrees of freedom in a 
multiple regression analysis are less than five or ten. 

When a standard multiple regression analysis indicates a negative number 
of residual degrees of freedom, then such an analysis is impossible. A random - 
balance design with a split model analysis is a practical alternative for these 
situations. 

The inefficiency of a random balance analysis with the split model, 


y = bir, + dor, + +++ + Ge, + ez. +--+ +2) 
bz, + bot. + ++ +f 
f = C2, + Crt + ++ te 
arises from the fact that the f-residuals have a larger variance, 


2 22 > S 2 
> C1051 + C2022 + —— + o, 





136 F. E. SATTERTHWAITE 


than the variance, o% , of the e-residuals in the complete multiple regression 


analysis. Thus a split model will be very inefficient whenever any cz-residual 
variable effect has a variance, c’o? , which is a significant fraction of the total 
o7-split model residual variance. 

Assuming more than ten residual degrees of freedom, the random balance 
split model analysis will be more efficient if 

cies es 

oO; df 
With less than ten residual degrees of freedom, the efficiency gain for random 
balance is even larger. 

We again remind the reader that the above discussion assumes that, for each 
a-unknown coefficient estimated, a split model is chosen a priori; that is, without 
examination of any é-estimates of any a-unknowns. (Different model-splits 
may, and usually will, be used for estimation of different a-unknowns.) If model- 
splits are made by hindsight, a pooling bias may, and usually will, be introduced. 

Interactions. In a multiple regression analysis, interactions are introduced 
into the regression model by use of x;2; product terms. For example, 


Y = Ap + AX, + Agr, + A323 
H+ Gy0%122 + 132123 + Ag3%oX3 + € 


is a three variable model including all the first order interaction terms. Removal 
of the effect of the a,.-constant term gives the model actually analyzed: 


y — 9 = bia, — 4%) + b(t. — FH) + O3(%3 — Fs) 
+ dio(t — %:)(t2 — %2) +. di(@1 — %)(ts — Fs) 
+ bo3(t%2 — F2)(%3 — £3) + (C — 2). 


An interesting and important property of random balance design is that. 
random balance among the main effect variables assures random balance between 
any main effect and any interaction and also assures random balance between 
any pair of interactions. 

Random balance among the main effects implies 


&(a; = £;) = 0, all ‘, 
&(2; — £;)(x; — Z;) = 0, all 7 ad j- 


Random balance then exists between a main effect and an interaction since 


&{(z;, — £)[(%; — %)(e% — %)]} = (8a; — £,)][(@; — (mm — %)] 
= 0 
E(x; — £)[(@ — £)(a; — %,)] = [@: — £)*E@; — %,)] 
= 0 





RANDOM BALANCE EXPERIMENTATION 
Random balance also exists between any pair of interactions since 


&{[(v — £)(a; — €) [i — Fr. — %)]} 
= [8@; — £))[(a; — %)(@ — %)(%. — %,)] 
=0 
E{ (a, — #)(a; — €) Mai — (ee — %)) = (8; — F) NG; — €)" Vee — %) 
=0 


Similar arguments apply for all higher order interactions. 

When interactions are included in the model, the number of unknowns in- 
creases very rapidly as shown in Table IIb. 

Examination of Table IIb shows that for a polyvariable investigation (10 or 
more variables) with a reasonable number of experiments (50 to 100), a complete 
multiple regression analysis is impossible if interactions are to be checked. 
(The residual degrees of freedom will be negative.) Random balance designs 
with split model analyses offer a possible alternative for such investigations. 
Experience has shown that for some types polyvariable investigations random 
balance is a practical and efficient alternative as well as a possible and valid one. 

We must point out that in polyvariable investigations of this type both pooling 
bias and multiple test bias will usually be present in the desired interpretations of 
the data. Allowances should (must) be made for these biases. 

Similar arguments can be formulated with respect to models that include 
terms for quadratic and other non-linear effects. 


TaBE IIb 
Number of Interactions 


Number of Unknowns in Model with: 


Number of First Order Second Order Third Order 
Variables Interactions Interactions Interactions 








Vor. 1, No. 2 TECHNOMETRICS May, 1959 


The Application of Random Balance Designs 


Tuomas A. BuDNE 
Great Neck, N. Y. 


TuHE Score or APPLICATION 


The Random Balance Experiment is frequently compared to and evaluated 
with the traditional designs which are well known in the statistical literature. 
However, its greatest value and advantages lie in an area in which comparison 
cannot readily be made. Experience has proved it to be a tremendously potent 
means of screening large numbers of variables in a limited number of samples 
to find the few which are the largest contributors to the effects under considera- 
tion. 

The limitations placed on sample size come from economic considerations 
and not from statistical ones. Such limitations become readily understandable 
when one considers an experiment in a foundry which requires that the entire 
manufacturing operation be conducted under the specified experimental con- 
ditions for a period of 16 days in order that only 32 samples may be taken. 
One can understand the reluctance by management to enter into such a venture 
without high assurance and confidence that the results will be profitable rather 
than disastrous. 

One can also understand that the management will not readily lend its manu- 
facturing operations to an enterprise whose plan is so complex that they must 
depend completely upon a statistician rather than a foundry man for the plan 
of action, for the analysis of the experimental results, and for conclusions and 
recommendations. 

Problems in manufacturing, research and development as well as in other 
areas frequently require the screening of large numbers of variables to find the 
critical few. It has become important that a relatively simple, effective and 
economically feasible technique be made available to technical personnel to 
use without the assistance of an expert statistician. The more usual alternative 
is that the technical man applies no statistical design at all. The Random Balance 
Design is the answer to this problem. 


EXPERIMENTAL RISKS 


We know the power function associated with almost all statistical tests of 
hypothesis. With stated null and alternative hypotheses and reasonable estimates 
of the residual error, we can specify an experimental design and a minimum 
sample size required to keep the two types of error at any predetermined levels. 

The statistical risks are associated with the stated hypotheses which concern 
themselves only with the selected group of variables. 


139 





140 THOMAS A. BUDNE 


In the screening type of experiment, the most important risk is that of failing 
to include one or more of the critical variables in the experiment. This is not 
generally accepted as a statistical risk. 

This risk may not be a concern to many statisticians, but it is often a major 
concern to those whom they serve. It is a challenge to the statistical world to 
find the optimum economical and statistical means for reducing this risk. Random 
Balance Experiments are a long step in this direction. 


ASSUMPTIONS 
The need for screening experiments arises in the following kinds of problems: 


a. Isolating the variables which are the major contributors to a condition of 
excessive variation. 

b. Isolating the few variables which have the greatest effect on a measured 
characteristic. 


Experience in a large number of screening experiments in industrial situations 
has consistently shown that there are only a few critical variables and a large 
number of unimportant variables associated with each specific problem. There 
is limited practical value in attributing “statistical significance” to any number 
of the “unimportant” variables while one or more of the “critical few’’ variables 
escape consideration. 

In the real world it becomes useful to assume that total variation and total 
effect can be broken into all of their components and that each component 
may be attributed to a particular variable or cause. In the light of experience, 
it is both practical and useful to make the assumption that a very few of these 
many variables or causes contribute a major portion of the total variation or 
effect. These two assumptions may be considered as the postulates of screening 
experiments. Under these assumptions, the existence of high residual variation 
in an experiment merely indicates that the most important variables were not 
included in the experimental design. Statistical significance alone is a function 
of sample size and this residual variation, and is thus not a good measure of 
what is or is not important in the real world. The absolute magnitude of residual 
variation must be considered. 

The assumption of a mal-distribution of causes to a total effect permits a 
screening of all variables which are possible contributors. By successive arithmetic 
corrections of the data, eliminating the largest effects, a point will be reached at 
which the absolute magnitude of residual variation is no longer affected appre- 
ciably by further corrections. Experience has shown no difficulty in attributing 
statistical significance to these few critical variables. What is missing in theory 
is the appropriate mathematical model for the mal-distribution assumption. 

It certainly should not be disconcerting to find that a screening experiment 
may at times require that a smaller confirming experiment on the identified 
variables follow. The screening experiment fulfills its objective in identifying 
the few important variables from the large number of variables which may 
contribute to the effect. In the real world, common sense alone dictates that any 
experimental findings should be checked and verified. The experimental risks 





APPLICATION OF RANDOM BALANCE DESIGNS 141 


must be reduced to an insignificant magnitude where dollars and cents are 
involved. 


CONSIDERATION OF THE DESIGNS 


A pure Random Balance Design consisting of a random sample from a full 
factorial design has several obvious deficiencies. The simple restriction of requiring 
that each level of each variable appear an equal number of times is a large 
improvement. Additional advantages are achieved by grouping variables into 
independent factorial and fractional factorial designs. The combination of var- 
iables in each group may then be randomized to form the complete design. For 
reasons which are discussed in the section under Techniques of Analysis and may 
even be obvious at this point, it is advantageous to include the variables suspected 
of strongest effects in a single group. In experiments of 32 tests, } replicates of 2’ 
factorial designs have worked out very well as a means of grouping variables. 
Higher degrees of fractionation bring the disadvantages of deliberate and un- 
avoidable confounding. 

In a screening experiment there is generally no advantage in using more than 
two levels of the continuous variables. There are definite advantages of main- 
taining simplicity in the analysis with two levels of each variable. Other con- 
siderations, however, may require 3 or more levels. This is, of course, particularly 
true in the case of discrete variables. 


A SYNTHESIZED EXAMPLE 


It seems appropriate to examine the Random Balance Experiment closely in 


a synthesized illustration in which all the facts are known beforehand. The 
facts, the experimental design, and the analytical methods employed are patterned 
after actual experiences. 

We assume a condition in which 12 variables, A through L, are possible con- 
tributors to a problem of excessive variation. The Random Balance Design 
consists of two independent } replicates of a 2° factorial design in which each 
combination has been randomized with the order of test runs. The 32 synthesized 
test observations start with a random sample from a normal population with 
mean = 100 and standard deviation, o) = 2.0. The following effects were intro- 


duced by adding them to the best results of the (+) levels of the indicated 
variables: 


—15toG 

—12toD 

+10 to JK interaction 
+8toA 
+6 to EH interaction 
+4toB 
—4tol 


The contributed variance in each case is E’/4 where E is the added effect. 
The variance, o; , of the model becomes: 





THOMAS A. BUDNE 
EF 
ot DG = 154.25 


or = 12.5 


Figure I below gives the numbered combinations of levels of the variable 
combinations in the 3 replicates of the 2° factorial design: 


First group of variables <A B C D 


~ 


Second group of variables 


Conran - WO Ne 
tii t+++ 
r++ 1% 
L+titert] ols 


ee. 
l++++1 
l+t++tli+ttil 


LEH Ht+ t+ +++tt++4 i] o 
I+1++1 


tit+ti ret 


ieee i il 


L+ittit+i 


| 
el 


od 
- + 
- + 
- + 
- + 
+ 
te 
4 


Phat, Pies 

| 

i++ 1 
eo 


| 
Pieter pcre 


Pra | 


CHEK FIFI FI EI FEET EIEIO ae 


lpi | 


Fiaure I 


The added effects were introduced to disperse strong main effects into both 
groups. One interaction lies wholly within one group while the other is split 
between both groups. 

Figure II gives the completed design; the original random sample data, Yo ; 
the synthesized data containing the added effects, Y, ; the successive stages of 
processed data, Y, , Y; , Y, and Y; ; and the sample standard deviation in 
each case. 





APPLICATION OF RANDOM BALANCE DESIGNS 


Factorial 
Test combination Levels of each variable Test results 
ra 


No. A-F G-L Y2 


+] a 
+] o 


18 7 
32 

6 

16 

19 


92 
99 
91 
102 
90 
91 


l+++++++ 
I+1+++ 


L++itit+ 
L+itt! 


L+++1 
l++i+¢ 


i+! 
L+etittti 


L+i +++ 
L+ibitteit 


b++t++++ 44+! 
L+++1 141 


| 

| 
i++ | 
i+) 


ithe itt 
| 


+HI+i 


L+++l 


UBER awsSaeRwBnNan 
l+ 1 
L+i +++ 
| 
| 
L+++++14+41 
L+++++4+1 
L+tittti+i 
L++itti 


itl 


be bt lor 


+ittti 


L+lt++ 
L+++i+ 
| 


L+1+++141 


| 
littl +t+++i + 
t+IitIt+Iiti 
i+ | 


L+ititt+tti 
b+i+i¢i 


ob 
oe 
ok 
os 
ot 
a 
a 
oh 


I+ 
b+1++! 
L++ 


++++4+! 
++++! 


Standard deviation 2.1 13.5 
Figure IIl—Design of the experiment and processed test data 


Figure III shows the data, Y, , in the graphical form of a scatter diagram for 
each level of each variable. It is idealistic to expect a perfectiy executed experi- 
ment with no slip-up or errors of any kind, and not unusual to find that there 
are reasonable doubts as to whether an observation should be kept or thrown out. 
As a solution to this question, the median is particularly effective. It also has 
the virtue of ease of calculation. The scatter diagrams indicate the median 
location for esch group of observations and the difference in median values for 
each variable. 

The differences 13.0, 14.5 and 19.5, for variables A, D and G, stand out sharply 
from the others. The question of a significance test seems almost academic under 
the circumstances. However, if a test is desired, a non-parametric test would 





THOMAS A. BUDNE 


oo e 7" 
eee gee 


Test Results - Y, 


ee 
e 
=< 


= 
‘ 
2 
” 
~- 
Ss 
a 
@ 
x 
2 
é 


3° eo © ee 
s¢ ee ef © 888 6@ 
e@ esse eee 8 © : 


+G- + HO +I - +3 - +K - +L- 


Ficure II]—Scatter diagrams for main effects based on original test data, Y1 


again be appropriate. The test used here is based on the length of the runs at 
the high end and at the low end of the data. The lengths of such runs are indicated 
in Figure III for the variable A, D and G. Under the null hypothesis that the 
variable has no effect, the probability of a total of at least R is 


R 
oo eee 
n + a n-1l-r 
where 7 is the sample size of each of the two equal sub-groups of observations 
and R is the sum of the lengths of the runs at the high and low ends of the data. 





APPLICATION OF RANDOM BALANCE DESIGNS 
The probability of a sum’ of runs of at least ten is 


16 
32-10 32—1—-10 
Cre + z= “16-1—r 
"as —— .005 


y32 y30 
C a Cis 


Ie Ee. 


Re Ge ea fs ee 


A+ 








AD = —11.87 








¥; = 106.17 











A+ = 93.42 
AA = 10.60 


Figure IV 


Thus, A and G, with run totals of 10 for each are significant at the 0.005 level. 
D has a run total of eight which is significant at the 0.02 level. . 





146 THOMAS A. 8UDNE 


The next step requires the removal of the influence of A, D and G from the 
data so that lesser effects may be seen. In Figure IV, the observations are laid 
out in a 2° factorial array for variables A, D and G. 

Due to the possible effects of the lack of perfect balance, averages have been 
taken within each cell. The average effect for each level is the average of cell 
average and the effect of each variable is the difference in these level averages. 
The differences 10.60, —11.87 and —12.81 for A, D and G respectively, are the 
basis for adding —11, +12, and +13 to the observations having (+) levels 
of these variable respectively. The resulting corrected data is shown as Y, in 
Figure IT. 

It may be noted that if a normal distribution may be assumed, the ‘‘?”’ statistic 
would be: 


(%, + Zz + Zs + E7) 7 (€, + Xs + Xo + Es) for the A effect; 
(z, + 2, + 2% + Z,) oe (Z; + i, + Z; + Zs) 


5 for the D effect; 


DP 


(Z, + % +5 + £) — (% + 4 + Fo + Fs) 


3 for the G effect; 


S, interaction 


(@, +2%,+4,+ 4%) — (@ +4, + % + ,) for the AG 
S, interaction 


@,+2%+2,+%) —(@% +4 +4,+4%,) for the DG 


S, interaction 


(z; + Lo + £6 + £7) re (Z. -- Ls -+- £5 i £5) For the ADG 
a. intersection 


where S, = S. J yy 


where 7; is the number of observations in each cell 
and S? is the usual residual variance with >> n; — 8 degrees of freedom 


The proper level of significance for the selected highest effects is in question 
but this problem is no different from the one arising in any multiple test situation. 
The risk of isolating a pure chance effect can and should be remedied by later 
verification. 

The corrected data, Y, , is again presented in scatter diagrams in Figure V. 

Only the effects of variable C stand out uniquely. Since this effect is not of the 
same magnitude as those already isolated, the interaction effects should be 
screened for further large effects. A sample of the scatter diagrams having the 
large interaction effects are shown in Figure VI. We note that the JK effect 





APPLICATION OF RANDOM BALANCE DESIGNS 


Test Results - Y, 
_ 
ye s 


Qo 
o 


seeee 
8 
8 
* ee 


4 
3 
: 


Test Results - Y, 


8 


(cca cilia 
+G- + H- +I - +3 - +k - +L - 


Figure V—Scatter diagrams for main effects based on corrected test data, Y» 


120 
110 
100 


90 


u 
cs 
' 
a 
2 
= 
= 
o 
© 
A 
v 
0 
& 


80 


t Results - Y, 
~ on 

7 

8 56 


‘o 
o 


Tes 


a= te +*+@Q@=- + = + JK -— 


Figure VI—Scatter diagrams for selected interactions based on corrected test data, Y2 





148 THOMAS A. BUDNE 
stands out sharply, followed by the AH effect. We have then: 
JK with an approximate effect of 9.0 


C with an approximate effect of 7.5 
and AH with an approximate effect of 6.0 


Since it is not possible to fill all cells in a 2° array for JK, C, and AH, a 2° 
array for JK and C, the largest of the three effects, is presented in Figure VII. 


JK+ JK— 


#2, = 101.18 C + = 96.99 


489 


JK+ = 91.13 JK— = 99.49 


AJK = —8.36 
Figure VII 


Since the effect of C of about 3 units is considerably less than the JK effect 
of 8 units, the data is corrected for the JK effect only. The results appear as Y; 
in Figure II. The newly corrected data again appears in scatter diagrams in 





Test Results - ry. 


Test Results - Y. 


Test Results — Y, 


APPLICATION OF RANDOM BALANCE DESIGNS 149 


Figure VIII for main effects and in Figure [X for interaction. It can be seen 
that the indicated effect of AH has vanished with the correction for the JK 
effect. 


Test Results - XY, 


Test Results - Y, 


+ G- + H- +J- +k- 


Figure VIII—Scatter diagrams for main effects based on corrected test data, Ys 


Test Results ~ Y, 


+ 0 - + AH - + AL - + EH - + EL - +Ck — 


Fiaure IX—Scatter diagrams for selected interactions based on corrected test data, Y; 


The largest effects at this point are of a much smaller magnitude than those 
previously located. The magnitude of remaining residual variation indicates 
that further large effects cannot be present. The three largest effects are in the 
variables EH, CG and C. 





THOMAS A. BUDNE 
C+ Cc- 








100 
97 
97 
i: 
95 
93 
91 

770 


100.78 C— 





AC = 2.22 


Figure X 


Only EH gives a sizable affect of 5 units. The data, corrected for HH, appears 
as Y, in Figure II and is again plotted in scatter diagram in Figure XI. 


e 
~ 
‘ 
a 
? 
e 
e 
oF 
” 
e 
& 





APPLICATION OF RANDOM BALANCE DESIGNS 


Pee 


Test Results - Y, 
PE BRS PP 


+G- +1I- +3- +x - +i- 


Fiaure XI—Scatter diagrams for main effects based on corrected test data, Y, 


The three largest indicated affects, A, C, and J, are analyzed in the 2° array 
below: 


iE 
AS 
TER 
q £ 
ee 
J * 
ay 
ue 
Py 
yi 
a 
ea 
* 
C 
a 
b 
rs 
3 


96.88 A-— = 97.66 
AA = —0.78 


Fieure XII 





152 THOMAS A. BUDNE 


The appropriate corrections of 4 units for C and —5 units for J appear in the 
Y; data in figure II. 

The indications that the isolated effects are now small and that the residual 
variance is a fraction of its original magnitude lead to the conclusion that little 
more is to be gained by further analysis. Figure XIII visually shows the progress 
made in reducing the residual variation to a close approximation of the original 
random sample, Y, . 


130 
120 
110 
100 


90 


Test Results - Y 


80 


Yo Y, Ye Ys %, 
ic a Sk 


Figure XIII—Frequency distributions of data at each stage of processing 


A summary of the actual added effects and the estimated effects appears in 
Figure XIV below: 


Multiple 


Variable Added Estimated 
effect effect 
G 15 13 
D 12 12 
JK 10 8 
A 11 
EH 5 
6 4 
I 5 


Figure XIV 
A least square multiple regression equation on the seven identified variables 
gives the estimated effects as they appear through a simultaneous correction 
procedure. 
Case History PROBLEMS 


The design in the illustration above closely resembles one which was used in 
a malleable iron foundry which had the problem of excessive scrap, for a variety 
of reasons. A Random Balance Experiment of 32 test runs programmed seven 





APPLICATION OF RANDOM BALANCE DESIGNS 153 


metal variables in a } replicate of a 2” factorial design; 7 sand variables in a 
similar design; and 1 mold and 2 pouring variables in a 2° factorial design. The 
true levels of each variable were set at what was known or appeared to be high 
and low limits of normal operation. The limitations in deliberate variation of 
furnace condition permitted only 2 runs per day, requiring 16 days for the com- 
plete experiment. Samples for the experiment were poured only when the furnace 
conditions settled to the required experimental conditions. 

A selected group of casting patterns which were particularly sensitive to 
each type of defect: shrink, hot cracks, blows, etc., was used as a basis for meas- 
urement. 

Castings were rated 0, 1, 2, 3 or 4 for each type of defect. The use of a number 
of each type of casting for each test gave a total score for each defect which 
could be treated as a continuous variable. The results showed clearly the strong 
effect of only 2 or 3 variables on each type of defect. In some cases a level of a 
variable which benefitted one type of defect was detrimental to another type. 
The experiment led to better control of the few critical variables, appreciable 
reduction of control in others previously considered important, and a dramatic 
reduction in scrap losses. 

Another experience was one in a testing laboratory which found itself in serious 
trouble by its failure to get sufficiently reproducible results in a routine testing 
operation of a chemical product. The fault was definitely in the testing process 
rather than in the product. Three types of laboratory apparatus were involved 
with numerous steps by technicians before and after the use of each. The problem 
was pursued in a manner similar to a time and motion study in which every 
possible step which could conceivably affect test results was noted, and pro- 
grammed into a Random Balance Design. The principle followed was that a 
variable which had no effect could not influence the experimental observations 
and any variable that had an effect belonged in the experiment. Before the 
planned test had been completed, the sequential nature of accumulating test 
results permitted a clear recognition of three major sources of variation. They 
were: large differences in a filter material from two separate sources, large effects 
due to leaks in a vacuum filter system, and very large effects attributable to 
different locations in a forced air drying oven. Eleven other variables were 
found to have little or no effect. Control of the three critical variables was suffi- 
cient to reduce testing variation to a very acceptable level. 


CasE History Data 


A chemical company was badly in need of improvement in a particular prop- 
erty of a product. The property was reasonably consistent during manufacture, 
which led to a study of the effects of the specifications of quantity for 15 different 
ingredients. For specific reasons, three levels of each of the 15 ingredients were 
included in a Random Balance Design. The 15 variables were grouped into five 
3 X 3 X 3 factorial designs. The graphical results appear in Figure XV. 

There is no doubt of the effect of variable J. Higher levels of this ingredient 
gave exactly what was wanted. There was no practical need to look for lesser 
effects. 





THOMAS A. BUDNE 


. 


Test Results 


i 
? 
é 


Test Results 


e 
e * 


(he ene creme cee hemes peat eer tle eh er een enn ere eee deena ween 
+0- *0 =— ¢o=-= +6 <— *¢@- 
K L ¥ N ° 


Figure XV—Scatter diagrams of original test results in a true case history 


However, for the purpose of further exploration into analytical techniques 
which may be useful for Random Balance Designs, the following considerations 
may be of interest: 

The variables were grouped as follows with indicated three factor interaction. 
mean squares: 





APPLICATION OF RANDOM BALANCE DESIGNS 155 


C, Dand E — CDEms. = 16.8 Nothing significant at 5% level 
F,Gand H — FGH ms. 8.5 Nothing significant at 5% level 
I, J and K — [JK ms. 1.1 J and K effects are significant 
L, M and N — LMN m.s. 6.2 N effect is significant 
O, A and B — OAB mss. 5.0 O effect is significant 


A comparison of residual (three factor interaction) mean squares 
suggests that the important effects are in the J, J, K block. 


SUMMARY 


The Random Balance Experiment proposed by F. E. Satterthwaite makes a 
positive contribution to the statistical and real worlds in providing a means by 
which high assurance can be given that the more important causes to a given 
effect can be isolated through the screening of a very large number of possible 
causes. In actual experience, all controllable variables which may be contributors 
to the effect belong in the one experiment. 

The writer, with a very wide experience in the application of this technique, 
has not yet encountered a situation in which the experiment has failed to find 
the critical few causes or on the other hand, has given misleading results. Even 
if this were not true, the failure to find the critical few variables would leave an 
unreduced residual variation which indicates that the real cause to the effect 
had not been programmed into the experiment. This in itself would not be an 
experimental failure. Commonsense follow-up procedures, on the other hand, 
would preclude any conclusions which are not sound. 

There is room for further development in the technique itself and in supporting 
mathematic theory. 


rr Sipe ee eee 


eS Eas a Sere Bates x FESS RS: 





ves 





TECHNOMETRICS May, 1959 


Discussion of the Papers of 


Messrs. Satterthwaite and Budne 


W. J. YoupEn 
National Bureau of Standards 


Perhaps it will help in discussing these papers to make a distinction between 
random balance as an interesting idea and random balance as a useful technique 
in experimentation. The latter aspect appears to be the chief area of disagreement. 
The technique of random balance must be judged against the established tech- 
niques in general use. 

Just over 50 years ago Student gave experimenters an exact quantitative 
measure to support the good judgement of experimenters who were comparing 
two means. Little more than a decade later, R. A. Fisher generalized the problem 
to provide an exact measure for judging whether several means, as a group, 
came from the same population. In the immediately following years, sceptics 
generally raised the question as to whether measurements made in the real 
world conformed well enough to the mathematical model to warrant the use of 
the ¢ and F tests. Caught in this period, and blessed with a laboratory, this 
discussant ran many, many experiments which included dummy treatments, 
i.e., with the same treatment given two labels, and built up a stout confidence 
in statistical techniques. 

The latter half of these 50 years has seen some interesting and valuable sta- 
tistical embellishments on the simple problem of evaluating two or more means. 
Some of these I’ll mention. Replication was the first support to be cut from under 
the experimental structure. Given a reasonably complex factorial experiment, 
the multifactor interactions were taken as a replacement for the estimate of error 
previously supplied by replication. There are two ways of looking at this device 
to save work. On the one hand, it may be presumed as a conservative act in that 
the estimate of the error would be biased upwards and one would be less prone 
to claim differences where there were none. On the other hand, and in the same 
measure, the experiment would lose sensitiveness for the detection of small 
differences. Thus the introduction of another factor or factors into the study in 
exchange for replication had the result of diminishing the power of the experiment 
to pick up an effect. 

The zeal with which men seek to learn more and more with less and less effort 
was soon made evident by the introduction of the fractional factorial. Here 
one-half or more of the factorial experiment was simply jettisoned on the argu- 
ment that, if higher order interactions could serve for estimating error, one 
could equally well confound these uninteresting contrasts with main effects and 


157 











158 W. J. YOUDEN 


with low order interactions. Now a relatively large expansion in the number of 
factors became possible—but at a somewhat stiffer price. Not only was the 
estimate of error biased upward, but the contrasts of interest were all contami- 
nated with unknown effects hopefully regarded as unimportant. The experi- 
menter was now in the position of trusting that this confounding did not obscure 
real effects or create the appearance of real effects where in fact there were none. 

Statisticians went to work along another line to give the experimenter more 
freedom in interpreting his results. At least three systems of multiple comparisons 
have been proposed that make possible the selection of comparisons among 
treatment means after the data are in hand. Here, at least, statisticians did play 
fairly with the experimenter in that the required inflation of the effects before 
they became meaningful was clearly set forth. The experimenter, looking at the 
new tables, could plainly see that this privilege carried a substantial price in 
diminished sensitivity of his experiment. He could see the price tag and decide 
whether the article he got in exchange was worth the cost to him. 

I come now to random balance which manages to extend the number of factors 
and the number of interpretations after looking at the data without any clear 
indication of the price. The major reason offered in behalf of tossing so many 
factors (a score or more) into a program as small as 30 or so runs is that some 
important factor otherwise might be completely overlooked. The claim is made 
that anybody can plan such an experiment. This seems a doubtful argument in 
behalf of random balance. The discriminatory powers of trained investigators 
to dichotomize factors into those worth investigating and those of distinctly 
secondary interest constitutes our strongest weapon of research. It may scare an 
administrator to tell him his present techniques may never even test some im- 
portant factor in his process but that is far from telling the whole story. 

What the administrator needs to be told is a little of the facts of life. If the 
effects associated with ‘the two levels of a factor differ by an amount equal to the 
standard deviation of a single run, it will require 16 runs at each level to achieve 
an 80 percent chance of detecting this effect at the 95 percent level of significance. 
A completely efficient experiment is required with no nonsense of using inter- 
actions for error and no confounding with interactions. The proponent of random 
balance emphasizes that it is large effects that they look for and they can stand 
a little inefficiency. 

Well, how big an effect is required for detection using random balance? Say 
four times that mentioned above? If that is the order of magnitude, a single 
run at each level will suffice to catch the importance of the factor in question. 
If the effect is large, each factor in turn may be tried at a level other than the 
standard practice and there will be as many runs as there are factors. Of course 
this will not catch improvements that depend on the joint action of two or 
more factors. 

A governing rule among experimenters is the improvement of the precision of 
comparison by controlling and holding fixed every element of the process not 
directly under exploration. Those that cannot be held constant often are allowed 
for by correcting for their shifts. Random balance introduces into the program 
all these factors and does more than leave them uncontrolled. Random balance 


S 
I 


g 
( 
d 
] 
‘ 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 159 


purposely sets wider limits than otherwise would be encountered in order to 
exaggerate the effects. Some of these factors with exaggerated ranges will produce 
more or less disturbance and presumably the experimenter is aware that he has 
no outside data on which to base a correction as he would in the usual course of 
work. The hope of using the data from the random balance results to provide 
these corrections is nil because they are not major effects. In consequence the 
disturbance remains in the results. This deliberate fouling of the experiment 
which is an inevitable consequence of elevating these factors to inclusion in the 
experiment will substantially increase the error of the comparisons. How much? 
Well, usually we expect a substantial reduction by taking pains to control 
conditions from even minor variations or by grouping the runs into blocks so 
that block effects may be removed. The effect on the error of purposely introducing 
major variations in these conditions can only be guessed at—but there is some 
evidence that it is substantial. This evidence is supplied by the proponents of 
random balance who make the point that they are looking for large effects. 
But this is trying to make a virtue out of necessity—under circumstances arising 
from the use of random balance. 

The price of looking in an unlikely place for a large effect appears to consist 
in accepting a substantial reduction in the chance of recognizing the moderate 
effects of factors that are likely to have them. The weakness of the random 
balance technique stands exposed once we see that random balance will often 
fail to detect factors that could have been identified as important factors although 
it is on just this pretext that random balance makes its bid for attention. 

Experimenters will expect statisticians to pass upon the questions that I 
have raised. I regard the inclusion of random balance on the program and the 
presence of this distinguished audience as evidence that statisticians are seeking 
to determine where random balance may be useful. 


Oscar KEMPTHORNE 


Statistical Laboratory 
Iowa State College 


Random Balance: An Evaluation* 


For some years we have been exposed to claims that random balance designs 
are very useful, and indeed have been given the impression that the body of 
designs that were put forward by Fisher, Yates and others is exceedingly poor. 
All this was rather frustrating when there was no definitive description of this 
new method of experimentation. The claims were made that we can look at 
any number of factors with any number of observations and that we could 


* This research was supported in whole or in part by the United States Air Force under 
Contract No. AF 33(616)-5599, monitored by the Aeronautical Research Laboratory, Wright 
Air Developemnt Center. 














































160 OSCAR KEMPTHORNE 


discover the existence of effects, two-factor interactions or 5-factor interactions 
with, for example, 20 observations on 50 factors. It is a good thing for the 
statistical profession that we are now being given a definitive beginning to the 
theory of random balance experimentation so that we can see precisely what is 
claimed and can evaluate whether the claims can be substantiated. 

Before embarking on my discussion I would like to register my disfavor with 
describing something which is of the order of 30 years old as “classical’’, but 
perhaps this is in the tradition of the U.S.A. 

I think that the argument Dr. Satterthwaite has presented relating the 
randomization of usual designs to the concept of random balance, is revealing. 

We do indeed practice some degree of random balance when we randomize 
the assignment of treatment combinations to experimental units, which may be 
defined in terms of space and/or time. Our purpose in doing this is two-fold 


(a) so that we have unbiased estimates of treatment differences and 
(b) so that we may obtain an unbiased estimate of error. 


It is to be noted however that the latter obtains only when there is no interaction 
of the treatment combinations and the experimental units. It is also to be noted 
that there is a basic difference in this randomization and the random balance of 
treatment factors which Dr. Satterthwaite advocates. In the randomized 
experiment our interest in the things randomized over—the experimental units— 
is negligible except to have negligible interactions of treatments and units. The 
randomized experiment of Fisher is aimed at assessing the applied treatments. 
In Dr. Satterthwaite’s procedure the aim is to assess the particular forces leading 
to variation in his experimental units. I would however like to assert that the 
theory of interpretation of observations based solely on randomization is in a 
poor state with regard to interactions of treatments and experimental units. 
Also randomization in this assignment has ‘been restricted so that as far as 
possible it is with regard to factors of the experimental units that are either 
unknown to the experimenter (either because they are unknowable or merely 
not observed) or are considered to be unimportant. This suggests that there 
may be real difficulties in disentangling interactive contributions in a random 
balance design, and I believe this is indeed the case. 

Dr. Satterthwaite has stated that he ignores in his presentation the two 
questions of multiple test bias and pooling bias. This makes the whole matter 
more reasonable because if these can be ignored the basis for random balance 
design is simply that in looking at one factor of the data we can forget about all 
the other factors. We obtain a measure of strength of evidence that each factor 
is important. Dr. Satterthwaite has not said that multiple test bias and pooling 
bias should be ignored, but from what one can observe from the present papers 
these are ignored. Dr. Satterthwaite promises subsequent papers on this matter 
and I hope that he will be able to do something about these matters. Whenever 
I have heard presentations of random balance I have always reached the conclu- 
sion: ‘“This is nonsense because there simply aren’t enough degrees of freedom 
to go around”’. If one ignores multiple test bias there are enough. 

I think the days are gone when we could tell an experimenter “Here is your 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 161 
design and here is the method of analysis” but I hope the day does not come 
when we say “Here is a design, which I think is good, but I know of no way of 
analyzing it”. Some of Dr. Satterthwaite’s remarks are, I think, perilously close 
to this latter statement, because, at present, it is only for the simplest aspects 
of interpretation that there is any theory at all. 

The basis for random balance designs seems to me to be simply this: let 
y(x, , %2, *** , X) be the yield with inputs x, , x, , --- , x, and suppose that 
y(a, » X25 °°? Xe) = Ay Hat Hate t+ tan te. (1) 
Then, with random choice of x, , %2 , «++ , 2 such that 
E(z;) = 0, ¢=1,2,--- ,k, 
we have 
V(y) = aiV, + a2V, pce aiV, + V(e) 
and y = Qo fa, X; oft f = Ao + Q),2r os f 
E(f) = 0 
Vif) aes as V2 + a3Vs + ees + a; + V@ (4) 


Also because x, , --* , 2, are chosen independently for each observation the 
f’s are independent so we may merely make a ‘’ test for the effect measured by 
the coefficient a; . 

There is no need to restrict equation (1) to contain only linear terms. We 
can put in terms like a.37.x; which will contribute a},V.V; to both V(y) and 
V(y | 21). All expectations are of course over the population of possible repetitions 


of x, , 22, °** , 2 and e. I have said the basis is simply the above, but the above 
is not really very simple. The reason is that we have no idea of what terms 
should be in equation (1). What happens about squared and product terms in 
the x,’s? Also in this framework we cannot have an equation (1) which contains 
more unknown parameters than observations. The array of values of the z,’s, 
commonly known as the design matrix consists of p vectors in an N-dimensional 
space where N is the number of observations and this N-dimensional space is 
completely spanned by N vectors. That there is a multitude of possible inter- 
pretations of ordinary fractional factorial experiments is so well understood as 
not to need explanation and the mere introduction of a random amount of 
non-orthogonality cannot enable much interpretation of more than N or (N — 1) 
factors. I will say more about this later. 

Dr. Satterthwaite gives a table of the efficiency of pure random balance whilulh 
I fail to understand at all. If one is to discuss mathematically the optimality of 
a particular method of experimentation, one must define with respect to what 
the optimality is measured. Dr. Satterthwaite seems to say that the purpose 
of random balance experimentation is to reach a reasonable model, which I 
interpret to mean an approximation (and I would not be inclined to press too 
hardly on the mathematical basis for such an approximation, provided there 
were a mathematical basis) to the relation of yield to the inputs. Mr. Budne 





162 OSCAR KEMPTHORNE 


on the other hand seems to say that the purpose of random balance experimenta- 
tion is to screen an array of factors to find the subset which do affect the yield 
non-trivially. I can only assert my view that Dr. Satterthwaite’s hopes are 
hopelessly over ambitious unless the situation is a very simple one. Dr. Satterth- 
waite’s table of efficiency seems to me to have no logical basis at all. 

If one considers the matter from Mr. Budne’s viewpoint, a little elementary 
reasoning is possible. Using normal theory’, one can speculate about the sen- 
sitivity of a test of a, , say, by noting that if x, takes the values +} and —} 
with approximately equal frequency and we have N observations, and one uses 
5% as a cutoff point in the test which evaluates a particular factor, the non- 
centrality parameter of the test (A in Tang’s notation) is 


N a 
8 Vif) 


the corresponding ¢ is 


a 


8 Vif) 


with numerator degrees of freedom 1 and denominator degrees of freedom N — 2. 
The dependence of sensitivity on denominator degrees of freedom is negligible 
if there are over 20 so this may be ignored. With a 5% test the probability of 
detecting the difference is (taking N to be 30) 


about .28 if dis 1 
.54 if @ is 1.5 

.78 if d is 2 
.93 if d is 2.5 

and .98 if dis 3 


For sake of argument suppose we require this probability to be 0.75, then 


2 
* vp must be greater than about 4 
The whole problem is to see if circumstances can be specified in which this will 
happen, and this depends on guessing the size of V(f). The situation is that we 
do not know V(f) because to know it we must know the equation (1) and if we 
know equation (1) we do not need to experiment. 

If the experiment is small and there is only one effective factor, with no 
curvilinearity, and an effect a, equal to ~/ V(e), then N = 32 would do the job. 
If there should be another factor acting linearly with a similar effect N would 
have to equal 64. One can consider all sorts of possibilities but it seems to me 
that there are two extremes 


1 One can presumably develop something by randomization theory but I suspect normal 
theory is adequate. 


amy tele ont tet O. & Att at od 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 


(1) there is no strong evidence for any factor 
or (2) there is strong evidence for any factor because N is large 


Where we are in this Jimbo of possibilities we do not know. 

What the sensitivity is with regard to high-order interactions is I have not 
worked out. But it seems to me incontrovertible that high-order interactions 
will occur as a result of confounding with low-order interactions or effects. 
We are all aware of this possibility with fractional factorials, and deliberately 
and knowingly take the risk of there being large interactions, which we will not 
discover because we shall accept the simpler explanation possible with the data. 
If Dr. Satterthwaite took a similar view I would have no quarrel with him. 
His extravagant claims carry little weight and it seems worth expressing the 
hope that Dr. Satterthwaite will realize that he injures the chances of whatever 
good points random balance possesses being generally accepted by making such 
claims. 

I believe that a large number of evaluatory procedures for random balance 
experiments can be developed on the basis of the physically applied randomiza- 
tion and I believe that publication of the present papers will lead to such develop- 
ment. 

I regard the hypothetical example of Mr. Budne on page 5 of his manuscript 
as thoroughly unrealistic. Has he considered how big an experiment is necessary 
to discover a factor whose effect is 7 times the standard deviation of a single 
observation, supposing that we have a fair a priori estimate of the reliability 
of a controlled observation? I guess about two observations would do the job. 
The fact that Mr. Budne’s analysis leads to a fairly simple interpretation and 
model is not convincing. He removes the effects of those factors with apparent 
large effects and seems to be carried away by the ease of the whole job. It is 
essential to note that the early adjustments are so large relative to their errors 
of estimation that the resulting data are almost as “good’’ as the original data. 
What magnitude of effects is it desirable to detect in an industrial experiment? 
One can only average one’s own experience and surmise, but I believe that effects 
less than one standard deviation of a controlled observation may be very impor- 
tant. If a process with many steps leads to a proportion of defectives equal to 
0.1, the standard deviation of an individual is 0.3 and an improvement equal to 
one-third of a standard deviation would make the “foundry man” very happy. 
If one worked with lots of 9, the desired effect would still be only one standard 
deviation and it would be surprising if one factor accounted for all of this. Of 
course, if one applies random balance in a situation where there are large and 
simple effects one will discover them, but will one discover them any better than 
with Plackett-Burman designs or with fractional designs based on Fisher’s 
work; or with the further designs mentioned by Dr. Tukey at the Pittsburgh 
meeting (1959). I believe not. What can be the conceivable gain of diliberately 
introducing random partial confounding? 

I wonder what reactions Dr. Satterthwaite would have to one of the designs 
mentioned in the previous paragraph. I also wonder what his reaction would be 
to such a design with the additional random balance feature of assigning the 





164 OSCAR KEMPTHORNE 


two levels in the pattern of the design at random to the two levels of each factor. 
This would have one or two desirable features; for example, if in the unran- 
domized design A is completely confounded with BC then in the randomized 
design, A will be randomly confounded with either +BC or —BC so that on 
the average the estimate of the A effect will tend to the true value. In this design 
we would have random balance with perfect orthogonality of simple effects. 

This design and the one used by Mr. Budne are a far cry from using less 
observations than factors and I anticipate that such a possibility will die a rapid 
death. . 

I believe Dr. Satterthwaite and Mr. Budne do not have sufficient fear of the 
difficulty of interpreting random balance designs. There are at least two aspects 
of such interpretation. The simpler one is assessing the weight of evidence that 
each factor is important, but account must be taken of what Dr. Satterthwaite 
calls multiple test bias and I hope he can do something about it. 

The other aspect is the fitting of a functional relationship and here I believe 
one could spend a large amount of time (both professional and computational 
and equally expensive) going forwards and backwards in removing effects and 
then finding one has to put them back. To claim, as both authors appear to, 
that a “foundry man’ can do the analysis and understand the results is an over- 
simplification. I have heard of a case in which a large amount of work by highly 
trained workers and a lot of time on a good computer were required. 

I have many comments on particular detailed points of the two papers: 


(1) Randomization over a finite set does not induce independence and while 
above I used normal theory as an approximation to arriving at some 
notion of sensitivity, I think statistical procedures must be based on the 
applied randomization (cf. Anscombe this issue) 

(2) Dr. Satterthwaite lists advantages of random balance designs. That the 
design problem is simple is easy to see. That the designs possess ‘analysis 
simplicity” is, I hope, not accepted on the basis of what I have said above. 
All sorts of graphical analysis are possible with orthogonal designs and the 
advantage of random balance designs in this respect is not clear. If 
“managers and technical personnel’’ have the attributes Dr. Satterthwaite 
impugns, we clearly need to educate them in ways other than statistical. 
Whether feedback is reliable with random balance is a moot point because 
random balance manufactures a large amount of variability. That it 
later explains part of this may be convincing to an engineer but it should 
not be. As regards sequentialization there is a point. That the number of 
experiments (i.e. runs) is flexible may not be a big advantage, because it 
will engender a strong tendency to chase any small hare that appears. 
Indeed a case can be made for the statistician taking a fairly strong line, 
that he will not examine the data until the plan has been completed. It is 
not at all uncommon, I believe, to find that engineering personnel go in 
different directions on very questionable evidence in a troublesome situation 
and after doing this for months they frequently do not have a clue on what 


they know or don’t know. As regards efficiency I have already made 
comments. 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 165 


(3) Dr. Satterthwaite says that in an investigation on temperature and pressure, 
“the interpretation of the effect of pressure is not influenced by the specific 
evaluation of the temperature effect obtained”. This is true only for the 
simple tests of significance and is not true when one chooses and fits a model. 

(4) Dr. Satterthwaite mentions a new technique “polyregression analysis’’. 
What is this technique or, more simply, what does it try to do and what 
are its elemental aspects? 

(5) I do not think the “pick-the-winner” method is relevant to the discussion. 
It certainly does not lead to formulation of a model. 

(6) Dr. Satterthwaite lists 7(a, b, c, d, e, f, g) circumstances in which random 
balance is likely to be a good procedure. I found that I could see possibility 
of agreement with him on none, though I might “buy” circumstances g. 

(7) There are many calls on Dr. Satterthwaite’s experience on which, I am 
afraid, I must be quite skeptical. 

(8) It is stated that “classical” designs are limited to 5 to 10 factors. On the 
one hand I imagine Dr. Cuthbert Daniel disagrees and on the other what 
is ‘classical’? 

(9) We are told that “technically trained personnel” use a mixture of exact 
and random balance designs. Is this a justification? 

(10) It is said that we should use tests of size 10~° or 10~"°. Except for very 
specialized situations, in which in addition the validity of inferential 
method should be above doubt, this will be practically impossible to achieve. 

(11) Dr. Satterthwaite wrote very carefully about unbiased estimates, and 
confidence limits but are his second iteration estimates unbiased and can 
one construct a confidence region of known (or bounded size) for his esti- 
mates? 

Dr. Satterthwaite says “Investigators in both methods tend to use random 
balance --- whether they are right or wrong in doing so is often not perti- 
nent’’. No comment seems necessary. 

The statistical difficulties of working with residuals are not mentioned 
at all by Mr. Budne and are treated rather cursorily by Dr. Satterthwaite. 
I believe they can be tremendous. However we may expect more work on 
this line to arise from the present publications. 

(14) I understand very little of the mathematics of Dr. Satterthwaite’s presenta- 
tion, whereas I think I would if it were correct. The onus is on Dr. Satterth- 
waite to produce a body of theory which is convincing. 

(15) I do not agree with Mr. Budne that one should depend on a “foundry 
man” for the plan of action etc. This is rather like expecting competent 
medical treatment from a nurse. 

(16) Mr. Budne’s presentation of how his analysis reduces the unexplained 
variation is convincing only so long as one forgets that the high original 
variation was introduced by the technique. 


CoNCLUDING REMARKS 


At the time I was reading the papers I came across the following quotation 
from Sir. Arthur Eddington: ‘Progress is measured not so much by the problems 





166 JOHN W. TUKEY 


we are able to solve as by the questions we are enabled to ask”. I think this 
presentation of random balance raises many questions which are important to 
statistics, even if random balance should fade away. It is, I think, too early to 
say whether this will happen and I think random balance with regard to factors 
which we believe to be unimportant but wish to check may become a useful 
procedure. However the probability of this is lessened I believe by present work 
on factorials and specifically that presented at Pittsburgh by J. W. Tukey on 
little pieces of factorials. In any case I think the ere of Dr. Satterthwaite’s 
ideas is worthwhile. 


Joun W. TuKEY 


Princeton University and Bell Telephone Laboratories 


It seems to me, Jack, that what you said about the sadness of using higher 
order interactions for error, and the consequent reduction of the sensitivity of 
the experiment didn’t take into account the passages in The Design of Experi- 
ments about broadening the base of inference. Lower sensitivity may be a con- 
sequence of measuring something much more meaningful to your purpose. Often 
it should be accepted. 

It also seems to me that what you said about the price of multiple comparisons 
is likely to be misinterpreted. The price in sensitivity for a fixed 5% overall- 
error rate has already been paid once you have gone from a t-test of a single 
contrast to an F-test of all differences among several determinations. There is 
nothing more to pay when you pass from F-test to multiple comparisons. 

* ~ ie * * 

Scatter plot analysis is graphical and striking, hence important. A quick and 
satisfactory significance test has some interest. Define the outer sum to be the 
sum of the number of cases in one group with higher responses than all cases in 
the other group and the number of cases in the other group with lower responses 
than all those in the first group, both numbers being at least one. The simple 
5% point for the outer sum is about 7 for all circumstances likely to concern us 
(see the first issue of Technometrics for details). 

Scatter plots based on random balance do not occur singly; they occur in 
eights, or twelves or twenties. If we want to spend 5% total error rate on all 
the scatter plots of such a group, the corresponding critical value will usually 
be larger than 7. Table 1 gives the numbers of two-sample comparisons per 
group (= number of scatter plots per group) which correspond to a given critical 
value for sample sizes of 8, 16 and © (= for experiments with 16, 32 or © trials) 
and an overall error rate of 5%. Mental interpolation to the actual sample size 
will usually suffice. For the main effects of 12 or 15 variables (groups of 12 to 
15 scatter plots) the critical value is 11. If we keep in mind both the 5% per 
scatter plot critical value of 7, and the 5% per group of scatter plots critical 
values of 11, we can assess the strength of the scatter plot evidence quite ef- 
fectively. 























DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 167 


TABLE 1 
Numbers of two-sample comparisons which can be made side-by-side without exceeding an overall 


error rate of 5% (for the whole group of comparisons). For equal or nearly equal sample sizes. 
Mental interpolation between the sample sizes given will usually be adequate, 


Value of sum Number of observations per sample* 
required for 5% 
per family 10 20 © 


1 1 


8 2 — _ 

9 3 or 4 2or3 2 or 3t* 
10 5 to9 4to6 4or5 
1l 10 to 18 7 to 13 6 to 9 
12 19 to 36 14 to 25 10 to 17 
13 37 to 72 26 to 51 18 to 31 
14 73 to 144 52 to 102 32 to 58 
15 145 to 288 103 to 204 57 to 109 
16 289 to 577 205 to 407 110 to 204 
17 578 to 1150 408 to 816 205 to 385 


1151 to 2310 817 to 1632 





* Number of trials per level in a two level pattern. 
t Yield rates slightly above 5%. 
Note: For 1% error rate overall, divide entries in body of table by 5. 


The scatter plots in Budne’s paper were so impressive that I took down Davies’ 
The Design and Analysis of Industrial Experiments, as a good source of standard 
factorial and fractional factorial designs—2”s and the like—based on real data, 
and scatter-plotted a number of classical experiments. It was my impression that 
I learned nearly as much from these scatter plots as from the complete analysis 
of variance. This does not mean that I think that scatter plots will replace the 
full analysis for these designs, but it does mean that (i) they may go a long 
way, and (ii) mechanization of their preparation as a first step of analysis is 
likely to be worth while. (Fifteen or more scatter plots from a single tabulator 
run seems quite possible.) 
* Ba * * 

Who will design and analyze experiments? This is an important point on 
which I seem to disagree with some of the other discussants. At the moment, 
supply and demand have made statisticians an expensive commodity. Ten 
years from now this will still be so. There are not, and will not be enough stat- 
isticians to go around. Non-statisticians, many without statistical training, are 
going to have to plan and analyze most experiments. They will often not do as 
well as they might with statistical training or cooperation. But if statistics is 
not available it will have to be done without. Random balance is, in part, di- 
rected toward the needs of the statistically untrained. 

Constant balance patterns—and I suggest that this, or constant balance 
designs if you prefer, is a far better term than classical designs—can be made 
























































































and GY oe kes i coca 



















































168 JOHN W. TUKEY 


TABLE 2 


Some fundamental saturated and nearly saturated patierns with <200 trials. (Bracketed 
paiterns can be regarded as parts of one another) 


Saturated Plackett-Burman Extended More extended 
fractions* patterns** «fractions fractions 


a 24 in 12 238: 4 24.5 
2 i 2% in 24 22.3 i . 235 
25 j 27 in 48 - 283 i 22.5 
23 ij 2°5 in 289.3 \ 272.5 
28 j 243 i 2182.5 
Qu7 j 219 in 
2°9 in 2°,.3¢ i 214,56 
3 i 279 in 278,34 j 272,56 
318 i 288 34 ; 
30 j 227 in 2148, 38 j 2'6.3.5 
255 in 248,3.5 
45 j 278,313 j 2423.5 in 128 
421 i 235 in 284 318 i 
27 in 3455 i 50\ 
5¢ i 280,340 j 248,34.55 in 100) 
52 ji 2 in 
287 in 216.45 j 377.5 in 81 
we a 248,45 j 280,327.58 in 62 
8° ? AD in Quz 45 } 
gio j 259 in (inefficient cases) 
112 j 287 in 284,421 j 345° in 50 
134 i 275 in 76 248,78 ji 248.34.5'° in 100 
2% in 84 2%, 89 
2° in 100 
Number 
of 


Sets ~ (9) (11) (8) (5) 


* R. A. Fisher, 12 Annals of Eugenics 283-290, 1945, also paper 40 in his Contributions to 
Mathematical Statistics (for more than 2 levels). 
** R. L. Plackett and J. P. Burman, 33 Biometrika 305-325, 1946. 


available to people without statistical training, available to all those who design 
experiments. To do this, we must do three things. We must have the patterns— 
we must package them so that almost everyone can use them—and we must 
have a simple analysis available. I think we have the analysis and patterns. 
Scatter plots are a very good simple analysis. And, as I said here yesterday, 
reasonably small patterns with mixed numbers of levels can now be set up 
without too much difficulty. Table 2 lists some possibilities. 

Table 3 is an example of how you might package a pattern so that it would 
be relatively easily used. This is a small pattern; this package would not take 
much space when printed in a book of tables. All the patterns in Table 2, which 
includes almost all the instances with up to 200 trials per experiment that one 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 169 


TABLE 3 
An example of packaging a saturated design in 8 trials for as many as 7 variables at 2 levels. 


Variables Some 
3 4 5 random orders 


wee 
ee 
me ky 
Ry Ry ee RR 
aCoararwnnd >» 
ane & OD dS WwW WO 
eK OrwWNW OanN 
OCONnNrReK ON Ow 


+1+ 
++ 


Notes: (1) Randomize the following: (A) selection of variables (if less than 7 are used); 
(B) assignment of actual factors to variable numbers; (C) assignment of levels of each factor 
to ‘‘H” (head) and ‘“?’ (tail), (D) order of trials. 

(2) Confounding of interactions may be traced by multiplying symmetry signatures. 

w=) {4 ois 

Example: variables (5) and (6) interact according to|}-+|-|—| = |—|; thus the interaction 
a eo + 

is confounded with variable (4). 


would like to use, would take a book of only some two hundred and forty pages. 
If you were willing to restrict yourself to not more than 100 trials per experi- 
ment, only a hundred pages would be needed. But until this packaging has been 
done, until constant balance is very nearly as easy to use as random balance, 
constant balance will find it very hard to compete with random balance in much 
of the area discussed by the speakers. 

Of course constant balance can only take us up to saturation (one of George 
Box’s well-chosen terms), up to the situation where each degree of freedom is 
taken up with a main effect (or with something else we are prepared to estimate). 
I think it is perfectly natural and wise to do some supersaturated experiments. 
Supersaturated constant balance patterns are impossible but it doesn’t seem to 
me that, at least in the hands of people who can read a table, random balance 
is going to be used completely or indefinitely in the supersaturated region. The 
speakers have told us to avoid pure random balance. The logical evolution of 
restricted random balance is to as near balance as may be had. I think families 
of constant near balance patterns are going to appear, and are going to be used. 

* * * * 

Side effects, in which I include everything except main effects, are an im- 
portant and controversial topic in relation to random balance. I don’t believe 
that interactions will be detected in nearly, exactly, or supersaturated experi- 
ments unless they are quite substantial. 





170 JOHN W. TUKEY 


Consider a saturated pattern, with every degree of freedom covered by a main 
effect, and consider a specific side effect, which we could surely detect if it were 
tremendous. When is one direction in the pattern, with regard to main effects, 
better than another for the side effect? If the main effects are randomized, they 
contribute the same variance in all directions, but the higher even cumulants of 
their contribution will vary from direction to direction. The fourth cumulant, 
which I like to call the elongation, depends on the sum of fourth powers of 
cosines of angles, and is likely to be the most important of these. If only a few 
main effects are large, it is good to have this sum as small as possible, since this 
diminishes the chance of concealment of a modestly large side effect by an 
accumulation of weakly confounded parts of main effects. 

I have heard Frank Satterthwaite worry about the identical confounding of 
interactions with main effects in saturated fractional factorials for some time, 
but only very recently did I realize that this sum, which is 1.00 for a saturated 
fractional factorial (the most unfavorable value possible), was a good indicator 
of quality’ of confounding. The Plackett-Burman patterns (those mentioned in 
Biometrika 33, 305-325, 1946 which are not fractional factorials) are saturated 
patterns with quite different properties. Comparative values of the sum are 
shown in Table 4, and it is clear that fractional factorials are very bad while 
Plackett-Burman patterns are very good. 


TABLE 4 


Value of 100 = cos‘ @ for individual side effects of specified character in certain saturated 
patierns (summation over all main effects). Larger values indicate greater danger of concealment 
of such a side effect. 


Saturated patterns with 2* trials 
(fractional factorials) 


Size of pattern Formal interaction Distinctive combination Minimum possible Random* 


8 33 14 55+ 
16 33 6.7 76+ 
32 33 :. a: 88+ 
64 33 1.5 94+ 


Saturated patterns with 4g trials 
(Plackett-Burman patterns) 


Size of pattern Formal interaction Distinctive combination Minimum possible Random* 


12 23 1 69+ 
20 24 .2 8l+ 
24 23 3 844 
28 23 7 86+ 


* Quite rough approximations to the average value of 100 = cos‘ 6 for random balance. 
Note that, in complete contrast to the situation for constant balance: (1) different side effects 
of the same character need not all have the same values in a particular realization, (2) mean 
value for different particular realizations vary. As a consequence; (a) the overall average 
depends on the degree to which the random balance is restricted, (b) individual values 
may be much larger, or much smaller, than any overall average. 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 171 


TABLE 5 
A situation which may arise under random balance; two main effects, their formal interaction, 
and the corresponding distinctive combinations. 


Main effects Inter’n Dist’ve comb’s 
A B AB (AB),4 (AB),. (AB), (AB)- 


cs 
B 


CONBHUPWHe 
wean nee se BERBER 


Mtge ns Bn Bannan | 


Number of H’s 8 
Number of #’s 8 


Go 00 
So 


It is not easy to give comparative figures for random balance. Under random 
balance, interactions do not stay balanced, as Table 5 illustrates. Even the 
average value of the sum depends on the extent to which random balance is 
restricted. The figures given are very crude approximations for weak restriction. 
They suggest that measured in this particular way, random balance is better 
than fractional factorials but nowhere nearly as good as Plackett-Burman pat- 
terns. If simple two-factor interactions concern you in an experiment, Plackett- 
Burman patterns are unusually attractive. 

There is another serious question here which we shall have to face. Which 
side effects? Are formal interactions what concerns us most, or are we concerned 
with distinctive combinations, with situations where a particular level of one 
factor cooperates with a particular level of another factor to produce an unusual 
response. (This would classically be described by equal amounts of two main 
effects and one formal interaction.) The slop-over from main effects to distinc- 
tive combinations is also given numerical expression in Table 4 for the same 
saturated patterns. The same qualitative conclusions appear, though much less 
strongly. Somewhat similar results apply to slop-over from side-effects, either 
formal interactions or distinctive combinations, into main effects. The P-B 
constant-balance patterns always seem to look the best. I believe we will use 
many more of them. 

* * a * 





172 JOHN W. TUKEY 


There are a number of areas where random balance can be useful today, and 
rather fewer where it will remain useful. There have been many negative reac- 
tions to random balance, and, if the stories are correct, many of these are justi- 
fied reactions to unwise claims that random balance could, or even should, be 
used everywhere. Where is it wise to use random balance? 

Satterthwaite staked out seven areas for random balance. Five of them seem 
reasonable for random balance to “have a place”, usually neither exclusively 
or preeminently, for I think constant balance, once adequately packaged, will 
compete moderately strongly in all. The other two are something else again. 
Area (d) “when experimentation is conducted on regular manufacturing opera- 
tions, particularly trouble shooting operations” has to be carefully divided. 
Trouble shooting is one thing. If you can’t wait for the statistician to get there 
from headquarters, and if somebody on the scene can run random balance, this 
may locate the trouble. But where you are doing carefully planned and thought- 
out experimentation in regular manufacture, I suspect you can afford statistical 
thinking, including constant balance and all else we know how to use. Area (e) 
“continuous experiments to optimize processes, products and . . .” is even more 
doubtful. Only in earliest exploration will Satterthwaite’s points about flexi- 
bility and ‘the case of stopping what you had intended and doing what now 
seems more sensible’ be important enough to make random balance worth while. 
The middle and later stages of optimization are almost certain to need constant 
balance. 

In the short run, I expect to see more random balance. In the longer run, 
constant balance, adequately packaged so that you don’t have to wait till the 
statistician comes, is going to take over much territory from random balance. 
Much, but not all. We have not replaced the control chart in many situations 
where, at the price of a lot more thought, something more precise and sensitive 
could be used. Similarly we will not completely replace random balance. We 
will not use it where it is important to detect effects of size one standard devia- 
tion. We will be careful to put the right kinds of warning signs on its label so 
that people will not feel they should use it where its use is unwise. 

Random balance seems to me to be a part of two of the current revolutions 
in statistical thinking: First, a return to an interest in the wider aspects of the 
data, growth of interest in procedures that are incisive, that lay the data open 
so that we can see what they look like inside, even though they to not give 
definite significance or confidence levels. This means emphasis on insight and 
understanding, rather than on “proven” knowledge. (This is not to deny a 
place for any type of technique, for the worst thing that can happen to statistics 
is for it, for us, to come to feel that there is but one set of problems and but 
one good way to tackle them.) The other revolution is an attempt to get an 
adequate hold on a variety of situations where the same data is examined from 
several points of view. (Attempts to balance bias against variance in regression 
estimation and the analysis of covariance, and Bucher’s Princeton thesis (1956) 
on recovery of intervariety information in lattice experiments are two other 


examples of the second revolution.) 
* * * * 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS , 173 


[Added late: Since I have been told that my oral presentation was interpreted 
in different ways by different people, I have tried to produce the more explicit 
statement which follows. J. W. T.] 

Most of the basic ideas underlying “random balance” (Make experimentation 
easy! Make analysis simple! Be guided by the data! Analyze the results as they 
come in, and stop when they speak clearly!) are not only good ideas, but are 
ideas of vital importance in certain applications. They need not be confined to 
random balance. Attempts to oversell random balance, to imply that all experi- 
ments can be “careless experiments’, have led to serious complications. For a 
time these complications threatened to lead to a monument above the grave of 
“random balance”. The two papers presented at this discussion are calm, col- 
lected and reasonable. If random balancers will at all time and in all places be 
as reasonable, random balance will continue to be used, mainly in areas where 
patterned experimentation has not been regularly practiced. Constant balance 
(and constant near-balance) techniques, slightly modified by random balance 
ideas, will, once the proper steps are taken to make their use almost as free and 
easy, replace “random balance’ in much of this newly statisticized frontier. 
Just as the cowboy is more glamorous than the nester, random balance is more 
dashing and attractive than constant balance. But while the cowboy has almost 
disappeared (except for dude ranches), random balance will remain in many 
places where quick answers are needed, but high precision is not. 

Many people go into a conscious or unconscious “‘tizzy” when confronted 
with a situation involving many factors. Random balance is an easy way out, 
one surely much more helpful than a “‘tizzy’’. 20% efficiency is so much better 
than negative efficiency as to make the difference between 20% and 100% of 
relatively minor importance. Of course, once both confidence in the face of, and 
competence in the handling of, many-factor situations has been reached, the 
difference between 20% and 100% becomes rewarding and important, and the 
role of random balance shrinks. 

There are those who say that random balance is the “wave of the future’, 
that most experimentation will come to use random assignment of levels or 
versions of each factor to trials. They are wrong. There are those who say that 
random balance is worthless, inefficient, and dangerous. They, too, are wrong, 
though not quite so wrong. Random balance has a continuing place, even though 
it is a relatively narrow one. Moreover, and more importantly “random balance” 
offers us many lessons, the least important or useful of which is the possibility 
of assignment of levels or versions to trials at random. If we are wise, we shall 
study the other lessons, and apply them to the great majority of situations 
where classical patterns will remain the best choice. Some of these classical 
patterns will be exactly as in 1925, 1935 or 1945; others will be modified or 
streamlined; all will make heavy use of constant balance. 

Some are more concerned with the ill effects or overselling random balance 
than with the good which can come from its lessons. Others, and I an ome, are 
more concerned with the coming good. But all of us have twin responsibilities: 
to detect and counter overselling, and to study and put to use the lessons. 

In the long run, random balance will have had four important effects: 


Fe Sak abe pO NAS mote Ra Anal A eee eee 


aS 
‘> 
‘é 
a 
ae 
ve 
Re 
ee 
Wid 
¥ 
a 
rs 
= 
a 
a 
L. 
paae* 
.™. 
? 
i! 
A 
; 





174 G. E. P. BOX 


(1) It will have loosened up unnecessary straight-jackets on the use of con- 
stant balance. 

(2) It will have statisticized broad new areas. 

(3) It will have become the technique of choice in the more marginal of 
these areas. 

(4) Oversalesmanship of random balance will have retarded the progress of 
constant balance experimentation, to everyone’s loss, in a variety of areas where 
careful experiments will always be needed. 


G. E. P. Box 


Statistical Techniques Research Group 


Princeton University 


During our life time much use has been made of a propaganda technique which 
works as follows: A series of statements are made most of which are unexcep- 
tional but among them a statement or statements are sandwiched in which are 
in quite a different category. If we are not very careful we swallow the whole 
sandwich accepting the false with the true. 

The inclusion of many true statements makes it less easy to expose the unsound 
ones since the issues can be so easily confused in debate. Let me therefore begin 
by saying that I believe the only thing wrong with random balance is random 
balance. Most of the peripheral statements commonly made in presentation of 
this topic, although not new, are nevertheless true. Furthermore, lest I be 
accused of attacking ideas which I not only believe in but have for some time 
time put forward myself, let me say that I believe that: 

i) There zs an important situation where we are screening variables—that is 
where a large number k of candidate factors exist but probably only some of 
these have large effects, and our task at this stage of the investigation is to 
discover which ones have the larger effects. 

ii) If n experiments are to be run and the size of the experimental error justifies 
such a course, we may legitimately use what may be called screening designs 
where the number of constants which would have to be estimated if every factor 
had appreciable main effects and interactions far exceeds the number of runs. 
In fact, latin squares, graeco-latin squares, hypergraeco-latin squares, and 
fractional factorial designs are among the balanced designs which have been 
used for screening for a very long time. (For an interesting example see L. H. C. 
Tippett (1934) “Applications of statistical methods to the control of quality in 
industrial production”, Manchester Statistical Society, quoted by R. A. Fisher 
in Design of Experiments, Chapter V.) 

iii) The situation in which such arrangements are used is frequently such 
that groups of experiments should be performed in sequence and the data ought 
to be viewed from a number of different aspects and points of view. (We should 
‘iterate’ on the model.) Successive groups of experiments should be planned so 
as to elucidate those questions still in doubt. 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 175 


iv) There is still a good deal of room for research on how screening experiments 
ought to be analyzed and standard methods of analysis and standard models 
are not necessarily appropriate. 

None of these things have anything whatever to do with random balance 
designs per se for all these considerations apply equally to balanced screening 
designs. 

Modern scientific statistics is concerned with two things. Data generation or 
design, and data analysis. There is no doubt that of these the first is by far the 
more important. Suppose we can arrange by careful design to generate data so 
that it will yield 100 units of information. Then if we use the most efficient 
methods of statistical analysis and our assumptions are correct we will extract 
100 units of information by that analysis, and if we use quick and dirty methods, 
we may extract 90 units of information. However, if we are misled into using an 
arrangement which will only generate 20 units of information then however 
ingeneous our method of analysis the most we can extract will be 20 units of 
information. It follows therefore that it is far more important to ensure that 
experiments are properly designed than it is to insist on elaborate methods of 
analysis. 

It is, I believe, not difficult to teach chemists and engineers the basic principles 
of statistical design and it is far less expensive to do this than to allow them to 
plan experiments in an inefficient manner. It is particularly unfortunate that at 
a time when many companies are beginning to realize the necessity for instituting 
training courses in efficient data generation that they should be bombarded with 
propaganda which purports to prove that to make any serious study of the basic 
elements of the important subject of statistical design is an unnecessary waste 
of time. 

A question of over-riding importance therefore in the assessment of the 
random balance method is, ‘‘What is its efficiency?” I have heard of estimates 
of the efficiency of random balance given by Dr. Satterthwaite, but I confess 
that I have never understood the basis on which he arrived at these. I have 
therefore undertaken an objective assessment of the efficiency of these designs 
in some important cases. 

Let us suppose that we have k quantitative variables, whose levels are denoted 
by x, ,%2, +++ , 2, and that we are interested in the linear effects of the k variables 
so that our model is 


Y = Bolo + Bit + Bot. + +++ + Bim +e 


where x, = 1, and e is a random variable distributed independently from one 
experiment to the next having zero expectation and variance o’. Suppose that 
N > k trials are run in a random balance arrangement. That is n; levels are 
chosen for the ith variable x, , each level being repeated r; = N/n,; times but 
the levels being otherwise allocated entirely at random to the runs. Then I have 
considered the efficiency of these random balance designs in estimating the k 
coefficients 


i) when the results are analyzed by the method of least squares 





176 G. E. P. BOX 


ii) when the results are analyzed by an alternative method favored by Dr. 
Satterthwaite. 


The measure of efficiency adopted in each case is the average value of the ratio 


ae Variance of estimate of 8; using an orthogonal design (1) 
‘Variance of estimate of 8; using a random balance design 


This ratio, expressed as a percentage, measures the number of units of informa- 
tion supplied by a random balance design per 100 units of information supplied 
by an orthogonal design, or alternatively, the number of runs from an orthogonal 
design which would supply the same information as 100 random balance runs. 

i) Least Squares Analysis. Irrespective of the number of levels n; chosen for 
the different variables it can be shown that 


N—k 


salen eh 


#=1,2,---,k. (2) 
It will be seen that so far as this method of analysis is concerned the efficiency 
of the random balance design in the area in which it has been recommended 
(namely, the number of factors k is large) is extremely poor. 

The above formula is perfectly general but its implications can perhaps be 
best realized by using it to compare a two-level random balance design (in 
which there are 3N minus ones and 3N plus ones allocated at random to each 
factor) with a two-level fractional factorial design in which the }N minus ones 
and 3N plus ones are systematically allocated to obtain orthogonality. Suppose 
we wish to explore k = 15 factors in N = 16 runs, then the average efficiency 
of the random balance design as measured by this criterion is 1/15th that 
supplied by the fractional factorial. 

In some writings on random balance attempts have been made to get improved 
arrangements by making certain of the factors balanced with respect to each 
other as well as to the dummy variable x, . Suppose restraints are introduced so- 
that the variable x, is orthogonal to p of the k — 1 remaining design vectors and 
not orthogonal to the remaining k — 1 — p then 


.-N=-p-k-p__N-k 
‘" W-D-p N-1-p 


Only when the variable x; is orthogonal to every one of the remaining k — 1 
variables (when p = k — 1) do we obtain a design of full efficiency. We notice 
that so far as this method of analysis is concerned it is basically the failure of 
the z’s to be orthogonal one to the other which gives rise to the inefficiency of 
the design. 

ii) Analysis by an alternative method favored by Dr. Satterthwaite. Satterthwaite 
proposes that we should proceed by plotting the observations against each of 
the x’s separately. When the apparently largest effect is found he then calculates 
the residuals from the fitted relationship and plots the residuals against the 
levels of the remaining factors. He then selects the apparently next largest 
effect and so on. As a method of analysis this procedure is not of course new but 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 177 


under the descriptive title of ‘PARC’ analysis has been used for many years in 
the analysis of data from undesigned experiments. 

In this discussion we are supposing that the effects are linear so that it is 
the relative slopes of the plotted lines which are compared that is the quantities 


_ dyes — %). 
~ SE — z)’ 


Our problem is to determine the variance of the b; under variation of the y’s 
and the x’s. Denote the sums of squares and products in the following manner 


x @ — #)° = fii ‘Pena. 
a (x; — £)(a; — £;) = [tj] j 


Then it is easily shown that over the variation in the y’s the mean value E,(b;), 
and the variance V,(b,) of b; are given by 


E,(b;) = +> ae (3) 
V(b) = o°/ [it] (4) 


Over variation in the y’s and the x’s we obtain 


E,.(b;) = B; (5) 


»k 


V,.(b,) = a 2, (aay 8 (6) 
Equations (3) and (4) refer to the behavior of the estimates for repetitions of 
a particular random balance design. The estimate b; is biased by every other 
coefficient present, and varies about this biased value with a variance depending 
only on o”. If on the other hand we think in terms of the repetitions obtained by 
taking a different randomly chosen random balance design each time, then the 
bias term in Equation (3) varies about zero taking sometimes positive values and 
sometimes negative values so that its effect is transferred to the variance of the 
estimate which now contains terms from every other coefficient present. 

The measure of efficiency of the random balance design defined in Equation 
(1) is now obtained by dividing o”/[ii], the variance of b; for an orthogonal 
design by V,, (b;) in Equation (6) 


rafieyty und} ° 


Once again we can see the implications of this formula by using it to compare 
a random balance two-level design with a fractional factorial. For such a design 
(7) = N, G = 1, 2, --- , k) and our formula is approximately 


E,= {1 = vy (8) 


iva 


where y; = 6,/o is the ratio of the jth regression coefficient to the standard 


os ort eae Cee PERO NT 


5 8 


-¥ 


SEES Ey ESS, 





178 G. E. P. BOX 


deviation. The apparent advantage of this type of analysis for a random balance 
design is that if only one of the 8; were non-zero then the efficiency for that £6; 
would equal unity whereas the efficiency by least squares would be (VN — k)/ 
(N — 1). On the other hand, if in addition to the coefficient 6; there were m 
other coefficients each of size o then the efficiency would only be 1/(1 + m). 

In practice it is difficult to know what is reasonable and fair to assume, but 
to allow the method to show itself in as good a light as possible we will proceed 
by making the type of assumptions which the supporters of the method adopt. 

iii) Exponential distribution of effects. In the promotion .of random balance 
considerable emphasis is given to what is called the mal-distribution principle. 
According to this, one would expect that when a large number of factors were 
tested their effects would be distributed approximately in an exponential distri- 
bution. This has always seemed to me unduly optimistic since in practice (for 
example in the start-up of a new process) the few very largest effects are large 
enough to be obvious without any use of statistical methods so that by the time 
these methods are used we will be dealing at best with a truncated exponential 
distribution which will be considerably less favorable to random balance. We 
proceed however with the rather generous supposition that even the largest 
effects are so far undiscovered. As before, we will compare a two level random 
balance arrangement for testing 15 factors in 16 runs with the corresponding 
fractional factorial. Rather than talking in terms of the regression coefficients 
8; it is perhaps a little easier to talk in terms of the ‘effects’ as usually estimated 
in a two-level factorial. Denote by a; = 27, the ratio of the jth effect to the 
standard deviation o. Designs of the size of 16 runs might reasonably be employed 
when the average size of the effects were between, say, o and 2¢ in absolute 


magnitude. We are to suppose therefore that the distribution of the absolute 
magnitudes of the effects follows the law 


—a/& 


1 
p(a) — a? ’ 


where a is 1 and 2. Now we can obtain the average magnitude of the largest, 
next largest, etc., of the effects from the tables of the average size of the ordered 
deviates from a sample of 15 deviates from the exponential distribution. Using 
formulae (2) and (6) we can then compare the standard deviations of the effects 
when 


i) using a fractional factorial design, 
ii) using a Random Balance design with least squares analysis, 
iii) using a Random Balance design with PARC analysis. 


To calculate the table we make the generous assumption that the effects are 
correctly eliminated in correct order using the random balance method. In the 
table only the first five or six effects are shown from which the general tendency 
can be seen. 

The average efficiency of the random balance method using PARC analysis 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 179 


‘ Standard Deviation of Estimates k = 15,N = 16 
Average Effects in 


size order of Fractional Random Balance 
of Effect magnitude Factorial Least Squares PARC 


+0.50 +1.94 21.12 
+0.50 1.94 0.94 
+0.50 +1.94 +0.84 
+0.50 +1.94 +0.74 
+0.50 +1.94 +0.68 


+0.50 +1.94 +2.72 
+0.50 +1.94 +2.12 
+0.50 +1.94 +1.74 
+0.50 +1.94 +1.46 
+0.50 1.94 +1.24 
+0.50 +1.94 +1.06 


in the detection of the largest effect is seen to be 


2 
(2-50) ~ 20% when the avérage size of the effects is 1.0¢ 


2 
(2-52) ~ 6% when the average size of the effects is 2.0c 


The fact must be faced that, whatever the method of analysis adopted, lack 
of orthogonality inevitably results in low efficiency for the Random Balance 
design. The fact that an unnecessary component of any vector x; exists in the 
space of the remaining vectors will inevitably cause this deficiency however 
ingenious the method of analysis adopted. It is clearly possible to detect effects 
using random balance designs, and the use of these methods in an organization 
which had done no systematic experimentation before might seem satisfactory. 
The results seem less encouraging when we realize that on the average we may 
be obtaining an euiciency of experimentation of about 1/5th or less of what 
is possible using orthogonal designs. 

It has sometimes been argued that although less efficient than standard 
designs, the random balance method has such great simplicity and flexibility 
that this compensates for its inefficiency. Such an assertion would I think have 
to be questioned if the experiments were not trivially inexpensive ones. It is 
understood from the users of these design that they have been employed mainly 
in large and expensive operations. Indeed, such must be the case if large sums 
of money have been saved. One wonders in experiments costing several thousands 
or tens of thousands of dollars whether efficiencies of the order of twenty percent 
and less can be tolerated and whether perhaps a more efficient overall strategy 
might not be to buy a book on statistical design, or hire a statistician or statistical 
consultant to provide an efficient arrangement. 


PEMEEOT OS PS Maer oe 





180 J. S. HUNTER 


iv) Super Saturated Designs. Whatever the merits or de-merits of the methods 
of random balance, it has been claimed that this method is the only one available 
when we are conducting a screening experiment to detect whether there are 
large effects and where there are more possible variables than observations. 
Satterthwaite and his colleagues have done a considerable service to statistics 
in pointing out the importance of this situation, although I do not believe that 
even here random balance is the answer. It would seem that systematic designs 
of greater efficiency than random balance designs are not available at the moment 
only because they have not been looked for. Orthogonality can no longer be 
obtained in this situation since we cannot have more than k vectors mutually 
orthogonal in a k space. It seems intuitively obvious however that the most 
efficient design (or to use an alternative terminology ‘construction’) where 
there are more variables than observations, will be that in which the x vectors 
make maximum angles with one another. For instance, if we wish to study three 
variables in addition to the mean in three experiments we could use the design 


i hh He 
Experiment 1 1-1 1-1 
2 1 1-1 -1 
3 1-1-1 1 


The four vectors 2» , 2; , 2, and x; correspond to lines drawn from the center of 
a tetrahedron to its four apicies and thus form maximum angles with one another. 
Other regular figures will clearly yield similar properties. 


J. S. HUNTER 


Statistical Techniques Research Group 
Princeton University 


In Mr. Budne’s paper an example is given in which twelve variables are 
studied simultaneously using a two level random balance design. The design 
consists of two half replicates of a 2° factorial each containing thirty-two factor 
combinations. These two designs are randomly combined to give a twelve 
variable design containing thirty-two factor combinations. Because of the ran- 
dom assignment of the factor combinations between the two sets of six variables 
mutually orthogonal estimates of all twelve main effects are not available. For 
the record, several two level factorial designs are listed here, each containing 
thirty-two or less factor combinations, and each providing mutually orthogonal 
estimates of at least twelve main effects. Thus the reader, in encountering prob- 
lems similar to the one described by Mr. Budne, need not suffer the losses in 
precision or understanding that are the direct result of non-orthogonal estimates. 

The easiest of these two level designs to construct is the 2° fractional 
factorial containing 16 experimental runs. This design is obtained by writing 
out all the column vectors associated with the main effects and the two, three 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 181 


and four factor interactions of a standard 2* factorial. The resultant fifteen 
column vectors given in Equation (1) are then used to identify the levels of the 
15 controlled variables. All fifteen main effects are mutually orthogonal. The 
main effects are confounded with two factor interactions. Clearly any twelve 
of the column vectors would provide an orthogonal first order design. 


Variables 


Run Number 7 


_ 
oe 
—_ 
— 
os 
w 
—_ 
rs 


+it+! 
1+ 
l+1+ 
lb+1+ 
l++1 
| 


+11 
1+] +1 
| 
| 
ltt 
i++] 
b+l+] ++i 
Litt] ++i 


i+ pr 
L+i+{+iti 
L++} ++! 
1+ 
l++ 


| 
I++ 
| 


titi 


pot 


erat laa et 
I+] +1 


aa 
a 
+e 
oe 
+ — 
+ + 
+ —- 
+ + 


titi 


~ ~ 
a a 


+ | 
+it+l 


+ | 


oe 


A first order design for studying the effects of sixteen variables in 32 runs in 
wuich all main effects are orthogonal and not confounded with two factor inter- 
actions can be constructed by combining the 2'°-“ design given in Equation (1) 
with a second 2°" design in which all the signs of the elements are reversed. 
The sixteenth variable is added to the experimental program by adding a column 
of +’s to the first 2°-" design and a column of —’s to the second. The design 
matrix for the resultant 2'°-" fractional factorial design is given in Equation (2). 

The experimental designs given in Eq. (1) and Eq. (2) are classified, respec- 
tively, as Type A and Type B first order designs in the paper “‘On the Experi- 
mental Attainment of Optimum Conditions” by G. E. P. Box and K. B. Wilson, 
JRSS, Ser. B., Vol. XIII, 1951. First order designs of Type A give orthogonal 
estimates of all main effects and unbiased estimates of the main effects proviled 
the assumption is true that two factor, and higher order, interaction effects are 
zero. First order designs of Type B give orthogonal estimates of all main effects 
unbiased by two factor interactions. Box and Wilson show that designs of Type 
B can always be constructed from designs of Type A by repeating the Type A 
designs with all signs reversed and combining it with the original Type A design. 
Furthermore, any variable held constant in the first set of experiments can be 
held at a new level in the second set, thus adding another variable to the full 
design. Thus starting with a Type A design to study (k — 1) factors in N = k 


ee th hn bir Gk Nive Bey og Oo TAG ENS 





J. S. HUNTER 


Variables 
9 


~J 
_ 
o 
ms 
-_ 
_ 
w 
-_ 
ros 
—_ 
se 


+ 
ie ige © 
> | 
rei 
i++ 
a 
ie 


Lt iti 
ce 


| 
se he 


(i++ 
eo 
rite i +44 1 


b++1] 41 
tt+t | +t+++ 


+1+! 
Crees 
Lae eel 

i+ ei 1+ 


I 

1+ 

l++ 

I+] r++ 
t+tt+] ++4++ 


ii > FST es 
ite lee et 
sil 
| 


Titi 
++ 
| 
L++1 
(tit 
| 


+itl 
oe bs 


I 
tl itl] +i 
++I 
+litjytiit 
t+Htt | +ttt | tHe] ttt 
+1+1 
+i 


| 
I+ 
l 

Pit] titi 


+1! 


| 
b++]) +411 
r++1] +t 
| 


i+itrit itt 


oP 
= 
= 
“fF 
+ 
me 
— 
- 


A 
: 
- + 
x 


L+i+]+i+1 
b++] ++ 


I 

Leo 
+14 
++ 

| 

I 


($+ L_ FU te etre i sf 


+tt+] ++4++ 


I++) ++i i 


I+1+ 
l++ 
I++ 


I 
l++1 
l 
+1 
+I+! 
++) 


L++] ++ 
++ 
i+] +0a+ 


I+1+ 
l++ 
eae ore 


isi ¢ 
tee 
tear 
ei hie ies 
Per ie + 3 
i++. ),+i 
ero 
Tess 
ie? i 

I 
ee ee 
Lei +? 1 
i++i),t+l 


32 


runs, a Type B design can be quickly constructed to study k factors in N = 2k 
runs as illustrated above in Eq. (1) and (2). 

An interesting collection of first order experimental designs of Type A are 
given in the paper “Design of Optimum Multifactorial Experiments” by R. L. 
Plackett and J. P. Burman Biometrika Vol. 33, 1946. These authors list first 
order designs for k = 3, 7, 11, 15 --- , 99 factors in N = 4, 8, 12, 16, --- , 100 
runs respectively. The estimates of all main effects are mutually orthogonal. 
Following Plackett and Burman the generating column vectors for these designs 
are given below for 12, 20 and 24 experimental runs. The reader is referred to 





DISCUSSION OF SATTERTHWAITE AND BUDNE PAPERS 


k = 11 k = 19 k = 23 
N = 12 N = 20 N = 24 


Iti tittttil 
Lei ti ttett 


Let i 


Leper ti i 
i+titl 


the original paper for the twenty-eight run design. To construct a complete 
design a second column vector is obtained from the first by slipping the elements 
of the first column vector once, and placing the last element of the slipped 
column vector in first position. This procedure is repeated, slipping the second 
column vector one element to produce the third, and so on until k vectors are 
obtained. Finally a row of minuses is added to complete the design. Thus for 
the case of k = 11 factors in N = 12 runs we generate the following Type A 
first order design 


to 
He 
— 
Oo 
~ 
os 


| 
| 
Ll ++ @ 
lit++ © 


lL++t+tit¢+ » 
bi ttti +t 
L+++1 
l+++1+ 


iti 
Li tttit+ti+ & 
Ltt i t+ ti 
L++Hit+iti 
L+ttl ++i + 
l++itl 


L+I 
P++it++i4 


Lti++i+! 
L++1+1 
I++ 


Following the rule given in the Box and Wilson paper, this design is quickly 
converted to a Type B first order design for studying k = 12 variables in N = 24 





184 F. E. SATTERTHWAITE 


runs by repeating the above design with all the signs of the elements in the 
design matrix reversed. 

A first order design of Type A for studying 31 variables in 32 runs is provided 
by writing down all the column vectors corresponding to the five main effects 
and all the two, three, four and five factor interaction effects associated with 
a full 2° factorial design and then identifying each of the 31 column vectors 
with the 31 variables. 

In circumstances where N must be large to provide reasonably precise esti- 
mates, and k is small, the k column vectors chosen to define the design should 
be randomly selected from those available. Similarly the + and — notation 
should not always be chosen to indicate the high and low level respectively of 
a quantitative variable, or the presence or absence of a qualitative variable, 
but randomly assigned so as to generate alternative fractions of the full design. 


AUTHOR’S RESPONSE TO DISCUSSIONS 


F. E. SATTERTHWAITE 


The discussions presented here on random balance are a gratifying milestone. 
These discussions are tangible evidence of expanding statistical applications in 
engineering and industry. Tonight we have Messrs. Cochran, Anscombe, Youden, 
Kempthorne, Tukey, and Budne on the platform with messages from Box and 
Hunter in absentia. The organizer of this meeting is, of course, Cuthbert Daniel. 
I should also like to include in spirit Dr. Samuel Brooks, Professor A. P. 
Dempster, Professor J. Kiefer, Dorian Shainin, and others for their work in 
this field. 

It is obvious that there is much room for disagreement among friends. I like 
George Box’s statement, ‘the only thing wrong with random balance is random 
balance.” To me it means that random balance is a symbol of something big 
and important. It means that random balance as a technique is too narrow for 
adequate accomplishment. The big job is irrevocably initiated. The answers to 
“if” and “when” are “yes” and “now”. The answer to “how fast’”’ is probably 
also beyond our control; it appears to be “very fast”. The question that is im- 
portant to us is “how well’? This question is still under our control. Much 
hard work will assure sound progress. The group with us tonight is evidence 
that this hard work will be initiated and guided by the best minds. By and 
large the hard work will be done by younger men yet unknown. For this reason 
I mention Professor Dempster who already has prepared several significant 
(unpublished) manuscripts. 

What is this “big job” implied by George Box? Statistics is only incidentally 
involved. Kelvin’s definition of science is that until we can measure and express 
in reproducible numbers, our knowledge is a nebulous thing. Technology is a 
balance of experimentation and theory in a hen and egg relationship. Neither 
can possibly come first. Today theory is dominant. In some of our largest techno- 





AUTHOR’S RESPONSE TO DISCUSSIONS 185 


logical fields we have a whole generation of engineers with no conception of 
what inspired experimentation means. How many have been priviliged to 
have a desk within fifty feet of the office of Dr. Zay Jeffries, a giant of the last 
generation whose experimental genius was responsible for the growth stages of 
three major industries: aluminum alloys, tungsten lamps, and carbide tools? 
The balance must, and soon will, be restored. Experimentation will be a full 
and adequate partner of theory. 

Will statistics be essential in this renaissance of experimentation? The answer 
is “yes” by definition since statistics is the science of the interpretation of 
numbers. My interpretation of a common feeling among all discussants is a 
fear that random balance may win by default. I doubt that it will be a default. 
Random balance will have hard competition. But also tonight is evidence that 
random balance is capable of winning if it is a default and that random balance 
may be a hard competitor to beat. Actually, of course, there will be many winners, 
each with its own place as experience shows where that place is. 


Technical responses 


(A) Many questions raised by the discussants are to be covered in sections 
of my paper to be submitted for publication to TECHNOMETRICS. These 
questions will not be considered here since the background must be prepared 
first. The discussants’ comments will, of course, be helpful in making the final 
editorial revisions. How much of this additional material on random balance 
will survive the referees and Editor Hunter’s budget I cannot say. The types 
of questions that will be answered include: 

(a) Additional analysis methods. More sections of Part II on general analysis 
methods will appear. 

(b) Efficiency. A broad discussion on efficiency will be submitted but not 
all questions can be answered at this time. (I agree with the discussants arith- 
metic on efficiency but often disagree with its pertinence and their interpreta- 
tions.) 

(c) Evolutionary Operation. A whole part will be submitted on this very 
important technique. EVOP has been expanded to REVOP, Random Evolu- 
tionary Operation. 

(d) Pooling and Multiple Test Bias. These problems have little to do with 
random balance per se. Several bias correction techniques will be submitted. In 
most cases these will be conservative bounds or semi-bounds. There is grave 
doubt as to the existence of unbiased correction techniques for some types of 
situations. 

(e) Non-Random Methods. Kempthorne asks about polygression analysis and 
other non-random balance polyvariable methods. Only a few of these will be 
covered. New methods of supersaturated design and analysis are being invented 
continually. For almost any specific problem it is easy to find design principles 
and analysis methods that are better than random balance. Such principles and 
methods are helpful for theoretical insight, but I rather doubt that the “better” 
methods can compete with random balance for most practical applications. The 
gains are not large enough. There are too many overriding theorems similar to 


- 


SO EO SR BA Pee 8 SSE Oe Sat aE 


noon 


BEY 











186 F. E. SATTERTHWAITE 


the 86 percent efficiency floor when the Wilcoxson method is compared to the 
best possible method in the worst possible situation. 

(B) The discussants naturally disagree with us on many matters of opinion. 
Only time answers opinion questions and none of us will have a high batting 
average when the final score is posted. The discussants have not changed my 
opinions and I doubt that my rebuttal will change their opinions. Opinions are 
dominated by personal professional work, by application experience, and by 
motivating goals. Can someone tell me how to select an unbiased and random 
sample of these? Actually I detect one common thread to our disagreements. 
The discussants are right for the application fields in which they have done 
most of their work and to which they have made such important contributions. 
We, and I think also John Tukey, are worried more about domains in the wilder- 
ness that today are hardly touched by statistics or effective experimentation. 

The opinion questions that should not be belabored further here include: 

(a) The application areas in which each technique is or will become a dominate 
one. 

(b) The economic as opposed to statistical efficiency of each technique. 

(c) Can technical and operating personnel be trained to plan, conduct, analyze, 
and interpret experimental programs in many variables? My experience says 
they can if the cost of doubling the number of data sets collected is not a con- 
trolling factor. 

(d) I am much less optimistic than John Tukey that fixed designs (i.e., con- 
stant balance patterns) can be packaged in a way that will appreciably improve 
their competitive position. Such packaging is an industrial engineering job. 
Seven years’ association with a firm of industrial engineers has not taught me 
how to do such packaging. It has taught me how exceedingly difficult the job is. 
I wish it could be done. Random balance would be a major user of the results. 
We “love” to imbed fixed designs in our random balance designs. (I suppose 
that my use of the word “love” implies a subconscious truth that even we use 
random balance because we have to, not because we want lo.) 

(C) Until extensive literature on random balance becomes available, we must 
bear the onus of sins we do not commit; J hope. We have learned to grin and 
bear it with a smile that is sometimes a little forced. A few examples: 

(a) Reread the ninth paragraph of Jack Youden’s discussion. We are more 
violently against the practices that Jack here attributes to random balance than 
he is himself. But do not jump to reverse logic. A factor adequately controlled 
and measured can never foul an experiment per se. We insist on control and 
measurement. We also ask for some information return from the expense of 
control and measurement. Thus our deliberate variation of the factor over an 
appropriate range, not an unwise range that “fouls” the experiment. Most 
importantly, we consider the data on the deliberately randomized factors in our 
analysis; we do not throw it in the waste basket. A most unpleasant experience 
is to have to point out that an effect, claimed to be non-significant by the original 
analyst, is in fact significant after removal of an effect which was deliberately 
randomized in planning the experiment and which was precisely recorded on the 
original data sheets. This happens! 


= «4 ao rw 


~_ hn ss ehUrkl Oe 





» 
, 


— = al lS a )« OY? 


ur 
ce 
1al 
aly 
he 


AUTHOR'S RESPONSE TO DISCUSSIONS 187 


(b) John Tukey’s comment on “careless” experiments has a different re- 
buttal since there is a natural tendency to feel that coin tossing is a “careless” 
technique for decision making. When do we randomize the design and when do 
we use a fixed design? A policy decision is needed on this point. We do not always 
randomize. We recommend this “few word” policy statement: 


(1) Use any design principle you wish, provided you have a good reason. 
(2) If you have no good reason for using some other design principle randomize. 


Short, good policy statements imply more than meets the eye. Until we give a 
definition of a “good reason”’ that is concrete, not vague, this policy statement 
is useless. Our definition is specific: 

A good reason is one you will write down and sign your name to. The sources 
of trouble in engineering experimentation, when run down, are usually. in the 
“careless” category. That is why this good reason policy statement is so effective. 
Signed “good reasons” that are ‘‘careless” provide their own cure. The engineer 
concerned is not responsible for experimental planning very long. This policy 
statement also solves another problem of practical experimental design. The 
dominant sources of major errors in experimental interpretation are 


(a) bias 


(b) accidental or “careless” 100 percent confounding with an unrecorded 
variable. 


When we add a strong insistence that every variable acting in the experiment be 
recognized and recorded these two risks are greatly reduced. A good reason is 
seldom accidentally confounded. A reason that introduces significant bias is by 
definition “not good”. Accidental confounding is no trivial problem. It is not a 
random accident. In any operating situation the pressures to systematize are 
great and, if not guarded against, will produce 100 percent confoundings that 
may be fatal to correct interpretations. 

Why randomize if a signable good reason is not available? We need protection 
against the subconscious “good reason’”’ which is in fact a bad reason. Our sub- 
conscious is stronger than we think. There is nothing more biased than an 
analysis which ignores the reasons for a design. This is why a valid statistical 
analysis of unplanned data is seldom possible. “How to Lie with Statistics” 
applies to data planning as well as to data presentation. An analyst can be 
fooled into a wrong answer on a real problem if the reasons for the data design 
are hidden from him. Fear the subconscious reason in data planning. The only 
protections are a 


(a) written good reason 

(b) tossed coin that can have no reason at all. 
Any other course is “careless” planning. 

(c) Unsaturated designs. We never advocate use of random balance when 
an appropriate unsaturated design (number of unknowns less than the number 
of data sets) is available. Box points out the efficiency loss this involves and 
there is no compensating gain. But watch out for fake unsaturation. Only the 
real problem can define the potential unknowns in the model: 


















































188 F. E. SATTERTHWAITE 


(1) Ignoring many of them is indefensible (unprofessional). 

(2) Assumptions regarding them are opinions, not science. A statistician 
advocating assumptions of this type essentially cuts his own throat. 

(3) Holding many variables constant makes any applications to other values 
of these variables pure opinion, with no supporting factual evidence. 

(4) 100 percent confounding does not remove the supersaturation or reduce 
the variation caused by the extra effects one iota. It just assures with 
100 percent certainty that any variation due to such effects will be mis- 
named. If a confounded effect is in fact significant, a conclusion that is 
just plain wrong is 100 percent certain. (Do not say subsequent experi- 
ments will be made to resolve the confounding until you include in your 
efficiency comparison the number of experiments required to do so and 
restrict your conclusion to those applications where sequentialization is 
practical.) 


The use of random balance when an appropriate unsaturated fixed design is 
available can be justified only as expedient. Do not, however, under-estimate 
expediency as a dominate and often controlling factor in experimental planning. 

(D) A few remaining points demand a reply at this time: 

(a) Random unidentified variation. Every opponent of random balance be- 
comes elated when he first notices that in almost all random balance case hist- 
ories the random unidentified error is a minute fraction of the total variation. 
This is because we identify causes which heretofore were: 


(1) Left unidentified and thereby inflated the true error to the point that it 
became a problem, or 

(2) Artificially reduce the residual variation by “holding many variables 
constant” at the price of all the risks that this “anti-statistical’”’ practice 
involves 


I think I can safely say that thirteen engineers on statistical consulting at Rath 
and Strong have never failed to explain three-fourths of this residual variation 
(15/16ths of the variance) whenever such reduction was important to solution 
of a real prablem. 

We do, of course, in practice identify effects that are as small as 10 percent 
of the unidentified variation. 1600 data sets would be a typical size for such a 
program. 

But the issue here is much more important. In the majority of engineering 
Studies un‘dentifiable variation can be made so small as to be essentially zero 
(my opinion; I do not ask you to agree). Is statistics to be confined to uni- 
dentifiable variation which is a problem in only 1 percent of investigations? 

(b) Case Histories. George Box opens his discussion with a very important 
point. Much of the credit given to random balance in case history success 
‘stories has nothing to do with random balance per se. This is why professionals 
in random balance application strongly resist the tremendous pressure placed 
on them to publish actual case histories. Case histories are never a random, 





AUTHOR’S RESPONSE TO DISCUSSIONS 189 


always a biased, sample. Professional publication must therefore dominately 
use Monte Carlo examples that anyone can reproduce for himself. 

The presently available case histories on random balance also contain so 
many 100 percent confoundings as to become worthless as evidence (though not 
worthless as propoganda). Let me name a few 98 percent confoundings to rein- 
force George Box’s point: 


(1) Almost all case histories are supersaturated and supersaturation is what 
produces efficiency gains by factors of 10 to 100. Highly fractionated factorial 
designs proved this long ago. Random balance is just making such gains routine 
rather than the exception. 

(2) The initial uses of random balance were under the guidance of a few 
exceedingly competent gentlemen. The case history data cannot discriminate 
between two possible explanations of success: 


(1) Management skill, 
(2) The technical value of the method. 


I assure you these gentlemen are competent enough to produce good results 
with weak methods. 

(3) The applications for which case histories are currently available are 
mostly ones where cream was waiting to be skimmed. Many other techniques 
could also have skimmed amounts worth bragging about. 

(4) Almost all applications used classical statistical methods in combination 
with random balance. Was Random balance just a sales gimmick that gave 
classical statistics a chance to be the work horse? I think a good two horse team 
carries the load more easily. 

(5) The most serious confounding in case history data is graphical analysis. 
Today graphical analysis is much more powerful for complex data than arith- 
metic analysis, electronic computors notwithstanding. Random balance is not 
necessary for graphical analysis and successes may have well been primarily 
due to graphical analysis, not because of random balance. If I personally had 
to choose between graphical analysis and random balance, graphical analysis 
would win in a walk at this time. I can easily get along without random balance 
in any situation I know of. I cannot get along without graphical analysis today 
for data that is at all complex. Fortunately I will never have to choose between 
them. I can always use both if I have a reason to do so. 

(c) John Tukey’s Table 4 is the approach that will settle the question of the 
design efficiency of random balance in relation to best possible fixed balance 
designs for supersaturated problems. With little understanding of the math- 
ematical properties of “elongation” as an appropriate measure, I shall risk the 
following tentative comments: 


(1) I doubt that a fourth moment measure is fair to random balance. At 
least the fourth root should be taken to prevent “lying with statistics” 
by scale exaggeration. 

(2) I question somewhat the appropriateness of the examples tabulated. 


Wk FOE BLE SE PE Re DES ELE SARIS 


25 


pein Te ee oe 





190 F. E. SATTERTHWAITE 


(3) It appears that Table 4 compares “all possible interaction” random 
balance with “two factor interaction” Plackett-Burman. If so, random 
balance is millions of times more supersaturated than the Plackett- 
Burman designs. 


Professor Dempster’s paper, ‘(Random Allocation Designs” (unpublished) seems 
to cover Tukey’s basic approach in detail and develops many basic theorems 
controlling the relative merits of all possible design principles. There is no 
question that maximum balance designs always exist that are better than ran- 
dom balance. My investigations, however, indicate the possible improvements 
become trivial with increases in complexity and sample size. 

(E) A few of Professor Kempthorne’s questions have not been adequately 
answered above. They are difficult to answer since they are not specific, some- 
times “‘careless”’, and often refer to statements out of context: 

(a) “I understand little of the mathematics of Dr. Satterthwaite’s presenta- 
tion, whereas I think I would if it were correct.” Professor Kempthorne has 
summarized the mathematics of random balance analysis very well in his equa- 
tions (1)-(4). Professor Kempthorne, however, does not understand his own 
mathematics. He says “we cannot have an equation (1) with more unknown 
parameters than observations’. Nonsense. Only equation (3) cannot have more 
unknowns than observations but this is quite another matter. Very large num- 
bers of equations (3) can be formed from a single equation (1). 

(b) Out of context. “... discover ... 5-factor interactions with ... 20 obser- 
vations on 50 factors.” Twenty observations is a bare minimum for any random 
balance experiment. I have, of course, discovered 5-factor interactions but in 
that case I also happened to have 1600 observations available. (Do not say I 
needed 1600 observations. I did, however, need more than 20. In theory of 
course, one can evaluate 5-factor interactions with maybe 3 observations; but 
never in practice; the data and model would have to be precise to millions of 
significant figures. One can always trade off data and model precision for ob- 
servations. You cannot, however, get very far with neither. Randomization is 
essential if the trade off is to be made at all with objective validity.) 

(c) Professor Cochran pointed out at the meeting that the dictionary sup- 
ports my use of “classical” in preference to Kempthorne’s. 

(d) “In the (classical) randomized experiment our interest in the things 
randomized over is negligible .. .”’ Negligible interest in variables acting in an 
experiment is “careless” engineering. They may be nuisance and unwanted 
variables but do not carelessly consider them of negligible interest. If they are 
acting, they can and do cause all kinds of trouble. In my personal experience 
statisticians have a higher interpretation error rate than good experimental 
scientists. When the cause of the statistician’s error is run down it is often that 
he carelessly had a “‘negligible’’ interest in a variable that was in fact important. 

(e) “Dr. Satterthwaite seems to say the purpose of random balance experi- 
mentation is to reach a reasonable model,’’ Random balance experimentation 
has no purpose at all per se. This is like saying that the purpose of numbers is 
to count. An investigation has a purpose and the purpose is different for each 





AUTHOR’S RESPONSE TO DISCUSSIONS 191 


investigation. A purpose of some investigations is to each a reasonable model. 
In other investigations the model and its evaluation are of no interest. Random 
balance is a tool. Being inanimate it can have no purpose. Only the user of the 
tool has a purpose. Random balance happens to be a useful tool for: many 
purposes. The user must decide if it is appropriate for his purposes and whether 
he, the user, is sufficiently skilled in using the tool so that the purpose will be 
accomplished. 

(f) Vagueness. “If Dr. Satterthwaite took a similar view, I would not quarrel 
with him. His extravagant claims carry little weight ...”. I am at a complete 
loss as to what “similar view” and as to what “extravagant claims”. Am I sup- 
posed to agree with the apparent antecedent statement: ‘But it seems to me 
incontrovertible that high-order interactions will occur as a result of confounding 
with low order interactions or effects.” Far from being “incontrovertible’’, this 
is nonsensical and “careless” writing. The existence of a high-order interaction 
is a physical fact. High order interactions 


(1) can never occur (i.e. be turned on or off at will); they exist or do not exist. 
(2) can never be the result of confounding; only the experimental design and 
the data analysis can confound. 


Professor Kempthorne is worried about something quite real even though he 
has expressed his worries poorly. Random confounding does occur through the 
“f” element of Kempthorne’s equation (3). This is no problem. A properly 
calculated confidence risk coefficient evaluates the risk of such confounding 
precisely and accurately to any desired number of decimal places, a million 
decimal places if you wish to work that hard. This confidence risk evaluation 
is independent of interactions between effects evaluated and effects thrown into 
the f-elements, as I prove in my paper. I believe in statistical methods because 
I believe in the validity and usefulness of confidence risk statements. Random 
balance is the technique that can include the risk due to thousands of possible 
interaction effects in the confidence risk evaluation. This is a proven fact, not 
an extravagant claim. 

(g) Professor Kempthorne asks my reaction to his suggestion that levels be 
assigned at random in Plackett-Burman designs. I was taught in school in 1938 
that a Latin Square analysis is invalid if one does not assign the levels to the 
design at random. I would strongly criticize anyone who does not assign levels 
to any incomplete design at random. Otherwise the analysis is not worth the 
paper it is calculated upon. Is this a new fundamental of statistical analysis 
due to Professor Kempthorne or is he trying to trap me? 

(h) Out of context. Professor Kempthorne quotes me as saying, “‘the inter- 
pretation of the effect of pressure is not influenced by the specific evaluation of 
the temperature effect obtained.’”’ Where did he lift this part of a sentence out 
of context? From the list of assumptions that qualify my presentations in Part 
II. 

(i) Professor Kempthorne does “not think the pick-the-winner method is 
relevant .... It certainly does net lead to formulation of a model.” Of course it 





192 THOMAS A. BUDNE 


is relevant since it is a pure random design method. Of course it does not formu- 
late a model. A non-parametric method is model free by definition. Model freedom 
is the reason pick-the-winner is important as well as relevant. 

(j) Misquote. “It is stated that classical designs are limited to 5 or 10 factors.” 
I doubt that the reader will disagree with the statement actually made in my 
paper. 

(k) Does Professor Kempthorne have so little confidence in statistical methods 
that he thinks risk levels of six to ten powers of ten “are practically impossible 
to achieve”? Engineers attain such risk levels routinely every day. Will such 
engineers hire statisticians whose ability is limited to two powers of ten? “Im- 
possible” is a dangerous word to use, Professor Kempthorne. A thing already 
being done routinely is never ‘‘practically impossible”’. 


Tuomas A, BuDNE 


In his discussion, John Tukey completely captured the significance of Random 
Balance. His reference to control chart usage in industry as a parallel situation 
is exactly to the point. Random Balance fills a particular set of needs, and when 
statistically superior experimental designs become as readily available, under- 
standable and usable for industrial people concerned with screening large num- 
bers of variables, without highly specialized assistance, then a second major 
step of statistical progress will have been made. 

With regard to some of the remarks made by other discussants, certain points, 
apparently, appear to need re-emphasis or clarification. 


1. A Random Balance design is not restricted to two levels of each variable, 
as are other proposed alternative designs without getting into severe complica- 
tions of design. Each of the case histories described involved at least one variable 
with more than two levels. Very seldom can one completely avoid discrete 
variables, requiring more than two levels, in a full screening experiment. 

2. In my own experience, a screening experiment is called upon to isolate the 
major contributors to a condition of undesirably large variation. Had the dis- 
criminatory powers of trained investigators to dichotomize factors into those 
worth investigation been sufficient, many of the persistent industrial problems 
would not have persisted so long. My experience in the factories differs with 
Dr. Youden’s experience in the laboratory on this point. 

If the effects are so large that as little as two or four tests would be sufficient 
to isolate the critical variable, then the technique of scquential plotting of test 
results, as recommended, would permit the successful termination of the experi- 
ment in two or four tests. This is indicated in one of the case histories. It appears 
rather inappropriate to pick apart the synthesized illustration, which seeks to 
demonstrate a technique, on this score. 

3. Almost all Random Balance experiments which I have experienced have 
pointedly set the levels of input variables within normal experience conditions. 
The case histories are clear on this point. Common sense alone would dictate a 





AUTHOR’S RESPONSE TO DISCUSSIONS 193 


realistic course in this regard. It is not clear why Dr. Youden makes statements 
to the contrary. 

4. The final proof is always in the eating. The effects of variables isolated by 
a Random Balance experiment or any other, should always be verified by a con- 
firming experiment or by some other appropriate means of deliberately turning 
the effect on and off. Everyone is in agreement that large effects can not easily 
escape detection if the contributing variables have been programmed into the 
experiment. The search for variables contributing lesser effects may certainly 
require more sensitive designs and the direction of a skilled statistician. None 
of us would use a yardstick and a vernier caliper interchangeably. Random 
Balance has proved itself in actual experience to be a very effective technique 
in the hands of the properly informed non-statistician. 





Y w™waeorvwvrevTvw wawa w se SS SB test YY SS OE Ve 





TECHNOMETRICS May, 1959 


Quick Analysis Methods for Random Balance 


Screening Experiments* 


F. J. ANSCOMBE 
Princeton University 


A short expository account of random balance is given, in which some different 
types of sampling are distinguished. As a quick significance test of effectiveness 
of single factors, a simple analysis of variance method is recommended. For the 
sake of sensitivity, it is suggested that the number of levels of quantitative factors 
should preferably be less than five. The degree of unbalance of a random balance 
design is studied, largely through an example, and a desirable upper bound is sug- 
gested for the number of levels of any factor, namely one eighth of the total number 
of observations. 


1. RANDOM BALANCE DESIGNS 


Suppose it is desired to test f factors in a factorial experiment, each factor at 
some stated number (two or more) of levels, not necessarily the same for every 
factor. The total number N of treatment combinations in a complete replication 
is equal to the product of the numbers of levels of all the factors, and will be 
exceedingly large if f is large. Satterthwaite [3, 4] has suggested that a useful 
experiment with some small number n of experimental units or tests can be 
obtained if one test is made at each of a set of n treatment combinations drawn 
at random from the population of all N possible treatment combinations. The 
more orthodox procedure would be to choose for the n treatment combinations 
a highly systematic sample from the population of all possible combinations, 
namely a “fractional replicate’. One might compare this with using a Monte 
Carlo computing procedure instead of orthodox non-stochastic computation. 

‘There is more than one way in which a random sample of n combinations can 
be chosen. First, sampling may be either with or without replacement. If n is 
much smaller than N (n < N), replacement makes almost no difference; replace- 
ment will be assumed below. Secondly, sampling may be conditional on each 
factor’s being represented in the sample a prearranged number of times at each 
level. Thus, if the first factor is to be tested at three levels and if n is divisible 
by 3, we may impose the condition that all three levels should be tested equally 
often, with similar conditions for all the other factors. I call this conditional 
sampling. Alternatively, no such condition may be imposed, and we then have 
simple unconditional sampling. 


* Prepared in connection with research sponsored by the Office of Naval Research. A draft 
of this paper was presented at the Annual General Meeting of the Institute of Mathematical 
Statistics, Cambridge, Mass., August 27, 1958. 


195 





196 F. J. ANSCOMBE 


The design may be written out as an array of n rows (one for each experimental 
unit or test) and f columns (one for each factor); the entries are the levels of the 
factors. This is called the design matrix. To obtain conditional sampling with 
replacement, prearranged numbers of each of the level-symbols for any one factor 
are distributed at random in the corresponding column (by shuffling cards, for 
example), independently of the allocation of levels in the other columns. To 
obtain unconditional sampling with replacement, the entries in each column are 
a random sample (independent of every other column) from some chance dis- 
tribution of levels. In either case, the entries in different columns are independent, 
and the factors are said to have random balance.* 


TABLE 1 
Design mairiz and yields: random balance design, conditional sampling with replacement, f = 8,. 
N = 4320, = 12, 


Experimental Levels of factors Yields 
unit number CD ee 


ph 
by 
hy 
Q 
my 


CON QOar WH 


10 
1l 
12 


0 
1 
0. 
2 
0 
0 
2 
1 
2 
1 
2 
1 


m*@wmeooonernOBw™MD 
wn WKROKBWOO BWM OD 
CMO SOMO OBWDDWS 
mnt BMH BEH OMS & 
me Onn OOooonro™ 
emaOSOOoneKOKROKROKH SO 
maooooRr OR R708 


There are eight factors, denoted by A, B, C, D, EZ, F, G, H, having various numbers of 
levels, from two to five. Factor A has three levels, denoted conventionally by 0, 1, 2, each 
appearing four times, so that the entries in the first column of the design matrix were obtained 
by randomly distributing a stock of twelve symbols consisting of four 0’s, four 1’s, and four 2’s. 
Factors B and C also have three levels each, distributed similarly (but independently). Factor 
D has four levels, denoted by 0, 1, 2, 3, each appearing three times. Factor E has five levels, 
denoted by 0, 1, 2, 3, 4, of which 1 and 3 have been chosen to occur three times each and 
0, 2, 4 twice each. Factors F, G, H have two levels each, denoted by 0 and 1, each level of 
each factor occurring six times. 

On the right of the design matrix is shown a column of fictitious yields. Those for which 
factor F is at level 0 are a random sample from a normal population having mean 50 and 
standard deviation 10; the rest (factor F at level 1) are from a normal population having 
mean 90 and standard deviation 10. 


* If sampling is without replacement, the entries in different columns are not completely 
independent. The same will be true, presumably, if conditions are imposed on the correlations 
between columns in the design matrix. One might suppose that conditional sampling with 
replacement, of the simple kind described above and illustrated in Table 1, would appeal 
most to users, but other’ types of sampling have been considered. It is desirable that in pub- 
lished work on random balance experiments the type of sampling, or method of writing down 
the design, should be specified clearly. 





QUICK ANALYSIS METHODS 197 


An example of a random balance design (conditional sampling with replace- 
ment) is shown in Table 1. For ease of illustration, n and f have been chosen 
to be a good deal smaller than they typically would be in practice. Although in 
some respects this is an unfavorable example, it does illustrate several general 
properties, and will be discussed in detail below. Note that with random balance 
designs there is no need for any particular arithmetic relation between the 
numbers n, f, N, as there would be for orthodox fractional replication. 

Such an experiment is easy to design. It would appear most likely to be satis- 
factory if it was a factor-screening experiment, in which most of the factors 
were expected to be irrelevant, and the primary object was to identify one or 
more factors that had some appreciable effect. In that case, the following simple 
type of analysis suggested by Satterthwaite might suffice. Suppose only one 
observation is made on each experimental unit, say a yield. Plot the yields 
against the levels of each factor in turn, obtaining f scatter diagrams. Because 
of the random balance (independent assignment of levels for each factor), it is 
legitimate to forget about all the other factors when looking at any one scatter 
diagram, i.e. one may validly test the significance of the regression of yield on 
level of any one factor without making allowance for the levels of the other 
factors. If in fact none of the other factors has any appreciable effect this test 
will be not only legitimate but efficient. Satterthwaite suggests that on the 
diagram showing the largest regression, if there is one, a regression curve should 
be drawn, deviations of the yields from it should be found and plotted against 
the levels of all the other factors, and the process then be repeated. (For a non- 
quantitative factor, the term “regression curve” is not quite appropriate. 
Deviations would be found from the mean yield for each level separately.) In 
this way “significant” factors are identified one by one. 

Thus random balance permits of a simple type of analysis of the observations, 
in which factors are considered singly. This is not necessarily the most sensitive 
possible method of analysis, but presumably it will be the more effective, the 
fewer the factors that influence yield, and fully effective if there is only one such 
factor. 

The first stage of such a graphical analysis of the data of Table 1 is shown in 
Fig. 1. The most pronounced regression is clearly on factor F. The next stage 
would be to subtract the mean yield for the corresponding level of F, roughly 
52 for level 0 and 93 for level 1, from the given yields, and plot these residuals 
against the levels of each of the remaining seven factors. 


2. QUICK SIGNIFICANCE TESTS 


It may be helpful on occasion to supplement the visual inspection of scatter 
diagrams as described above by an objective significance test, while still con- 
sidering the factors only one at a time. 

With most types of experimentation it is no doubt fair to say that significance 
tests are inappropriate. Factors are included only when they are thought likely, 
even certain, to influence yield, and the purpose of the experiment is to measure 
responses, rather than primarily to detect the possible presence of responses. 
But im a screening experiment, the experimenter may fully expect that most 





198 F. J. ANSCOMBE 


of the factors included (all of them, if he is out of luck) will have no appreciable 
effect, and he may reasonably adopt the line that he will consider any factor to 
be irrelevant unless there is clear evidence to the contrary. The weight of such 
evidence is measured by a significance test.* The tests discussed here relate to 
each factor separately. If all f factors are without effect on yield, and if a signifi- 
cance test is made on each, the expected number of results “significant” at the 
5% level will be {/20, and so on. 

Suppose the factor under consideration has a small number k of levels (2 < 
k <n). If the levels are not quantitative, the obvious test criterion is derived 
from analysis of variance of the yields between and within levels. Even if the 
levels are quantitative, this is still a reasonable criterion, since one cannot*be 


4 B C D E F G H 
ots GOVE Ole Gites wa wr et heh 


FACTOR LEVELS 
Figure 1—Scatter diagrams derived from Table 1. 


sure what sort of regression curve is correct. One might make a normal-theory 
test based on the F-distribution, or (what usually comes to nearly the same 
thing) one might make a randomization test, following Welch [6], of the non- 
parametric hypothesis that the given set of yields has been associated at random 
with the given set of levels. Since the same set of yields will be considered re- 
peatedly in relation to each factor in turn, the randomization test is attractive. 

For the given set of yields (or possibly for the set of deviations of the yields 
from a previously fitted regression curve on another factor), let 7’ denote the 
total sum of squares about the mean and let R denote the residual sum of squares 
of deviations of the yields from level means. One might take as test criterion the 
ratio of residual mean square to total mean square. If this is called 1 — U, we 
have 


* It is not necessary, however, to approach this problem through significance tests. Beale 
and Mallows [1] have considered simultaneous least-squares estimation of all responses in the 
light of a prior probability assumption that most responses will be small. 





QUICK ANALYSIS METHODS 
(1) 


The upper tail of the distribution for U corresponds to a significant association 
between yields and levels. From Welch’s results ({6], p. 152) one finds, for random 
association of levels with yields, 


2(k — 1) K 

&(U) = 0, var(U) = eee ~~ Kah (2) 
where K, and K, are respectively the second and fourth sample cumulants or 
“k-statistics”’ of the n yields, in Fisher’s terminology. (It is supposed here that 
each of the k levels appears equally often; otherwise a further term given by 
Welch needs to be added to the expression for var(U).) If the set of yields does 
not look markedly unlike a sample from a normal population, the factor in curly 
brackets can be ignored. If n > k, (n — k)U + (k — 1) has approximately a 
x’ distribution with (k — 1) degrees of freedom. The usual normal-theory test 
is nearly the same as this. The ratio of the mean square “between levels’ to the 
residual mean square is 


(n—-HU+h—-1) 
; (k-Da—vU) ’ 


and this has the F-distribution with (k — 1) and (n — k) degrees of freedom. 

In Table 2 values of U are shown for all eight factors in the imaginary experi- 
ment of Table 1. The standard deviations of U have been calculated from 
equation (2). Only one U is clearly significant, that for factor F. Residuals from 
the level means for factor F have therefore been obtained, and U’s recalculated 
for all the other seven factors. Only one U (for factor A) now exceeds, barely, 
twice its standard error, and since the sampling distribution for U in that case 
has presumably positive skewness like a x’ with 2 degrees of freedom the devia- 
tion cannot be considered remarkable. One will conclude that only factor F 
has given clear evidence of affecting yield. This conclusion happily agrees with 
the way the fictitious yields were composed. 

A much quicker test than U is available when k = 2, namely Tukey’s compact 
two-sample test [5]. A count is made of the number of yields at one of the levels 
that are above all the yields for the other level, plus the number of yields for 
the latter level below all the yields for the first—provided both these numbers 
are positive; otherwise no count is made. Tukey has found that percentage 
points of the chance distribution of the total count are nearly independent of n, 
under random permutation of the level-symbols, assuming approximately $n 
observations at each level. For n = 12 he gives 7 as the two-sided 5% point 
and 9 as the two-sided 1% point. From Fig. 1 we obtain the following counts: 
for F, 12; for G, 3; for H, 3. The first is clearly significant, the others not. After 
subtracting the level means for F, we obtain: for G, no count; for H, 3; neither 
significant. 

For a quantitative factor whose levels can be adjusted finely at will, it is not 
necessary that the number of levels k should be small. The levels tested in the 





200 F. J, ANSCOMBE 


TABLE 2 
Analysis of yields in Table 1: significance tests corresponding to the graphical analysis of Fig. 1. 


Factor A B C D E F G 


Analysis of original yields 
U —0.11 0.22 —0.21 0.40 -—0.04 0.80 —0.07 
s.d.(U) 0.20 0.20 0.20 0.26 0.31 0.13 0.13 


Analysis of residuals after fitting F 


U 0.41 0.18 —0.16 -0.05 0.41 0.05 
s.d.(U) 0.19 0.19 0.19 0.25 0.31 0.13 


experiment could even be all different (k = n). In that case, a test criterion 
that suggests itself as being easy to calculate and independent of any assumption 
concerning the nature of the regression curve is one based on squared successive 
differences. Let the yields be arranged in ascending order of the levels of the 
factor, and let S denote the sum of squares of the (n — 1) successive differences 
of the yields in that order. Then a test criterion corresponding to U above is 


S 
Vel-a, | (3) 


with the upper tail corresponding to significant regression. Young ((7], pp. 
294-5) has found the first four moments of the distribution for V, under random 
reordering of the set of yields. In particular, 


&(V) = 0, var(V) = BIS — lee ( 


where m, and m, denote respectively the second and fourth sample moments 
of the yields (defined as sums of second or fourth powers of deviations from 
the meari, divided by n). If the set of yields does not look markedly unlike a 
sample from a normal population, we shall have approximately 


1 
var (V) = 9» 
and the distribution of V is nearly normal. 

Another quick test is to group the levels together into a small number of 
groups and use U. But in that case the number of levels would have been better 
small to start with. How many levels are advisable, if the U-test is to be used? 
The variance of U increases with k. If we could be sure that the response to the 
factor (if any) would be nearly linear, there should be only two levels, spaced 
wide apart. Often, however, response curves have a maximum, or a noticeable 
curvature, and three or four, or even five, levels might be thought safer and 
more interesting. 

Now if in fact there is a pronounced regression of yield on level, of the sort 





QUICK ANALYSIS METHODS 201 


just mentioned, and if the levels of the factor are distributed with equal spacing 
over a certain fixed interval, the criteria U (with k small) and V (with k = n) 
have roughly equal expectations, for they are approximately the same function 
of the ratio of residual variance of yields about the regression curve to gross 
variance. But on the null hypothesis V has a much greater variance than U. 
(For example, if n = 50 and U is based on as many as 5 factor levels, ie. k = 5, 
the variance of V is roughly five times that of U.) It follows that V gives a 
much less powerful test than U. 

The following conclusions may be drawn concerning quantitative factors. 
Suppose we are willing to assume that if any of the factors has an appreciable 
effect on yield the effect can be represented by a low-degree polynomial regression 
curve with not more than one maximum. Then 

(i) no use should be made of the squared successive difference criterion V 
(which effectively measures serial correlation), and 

(ii) analysis will be easier and nothing lost if the number of levels for each 
factor is small, not more than 5, preferably 3 or 4. 


3. How MUCH BALANCE? 


If two or more of the factors tested in a random balance experiment do in 
fact have a substantial effect and the experimenter becomes aware of this, he 
will wish to estimate the responses to those factors. At this point it is of interest 
to inquire how far the effects of the factors can be disentangled, that is, how 
close the design comes to being orthogonal. The fact that the degree of nonorthog- 
onality or unbalance is random can be made the basis for an objection to the 
whole notion of random balance designs. Such designs may work well on the 


average, but should I trust to one now on this occasion? A similar objection 
can be raised against Monte Carlo methods of computation. And a similar 
objection again can be raised against randomizing the layout of an experiment 
having a conventional systematic design; though there the randomization seems 
less drastic than in a random balance experiment. 

Let us consider further the imaginary experiment of Table 1. The. last three 
columns of the design matrix each contain six 0’s and six 1’s. It could have 
happened, by an accident of randomization, that two of these three columns 
were identical, or identical except for a consistent interchange of 0’s and 1’s. 
In that case the two corresponding factors would have been completely con- 
founded; if a pronounced response was associated with them, we could not tell 
at all (without further experimentation or background knowledge) which of 
the two factors was responsible. Similarly, it could have happened that two of 
the first three columns (which each contain four 0’s, four 1’s and four 2’s) were 
identical, or identical except for a consistent interchange of levels; again the 
two corresponding factors would have been completely confounded. It is easy 
to calculate that any such perfect confounding is improbable, and in fact it 
has not happened in the actual layout of Table 1. 

At the other extreme, two columns of the design matrix may have perfect 
orthogonality, in the sense that, associated with any chosen level of one of the 
factors, all the levels of the other factor appear in the same relative proportion 





‘202 F. J. ANSCOMBE 


-as they do in the experiment as a whole. In Table 1, the first and sixth columns 
(for factors A and F) have this perfect orthogonality, for opposite the four 
0’s in the A-column there appear equal numbers of 0’s and 1’s.in the F-column; 
and similarly for the four 1’s and for the four 2’s in the A-column. In fact, six 
of the twenty-eight possible pairs of columns in the design matrix of Table 1 
are orthogonal, namely A and F, A and H, C and F, C and G, F and G, G and H. 
If two columns are orthogonal, the corresponding factors are completely uncon- 
founded; if one of them has an effect, it cannot cause the other to appear to 
have an effect. If all columns were mutually orthogonal, the response to each 
factor could be estimated by least squares independently of the responses to 
all the other factors; and this is what happens in orthodox full factorial experi- 
ments. 

In Table 1, no accident of randomization could possibly have made all eight 
columns mutually orthogonal,* but with luck every pair of columns may be 
not far from orthogonal. How can one measure the lack of orthogonality of two 
columns? If both the factors are at two levels only, a natural measure is the 
square of the ordinary correlation coefficient between the column entries. It is 
easy to see that the result will be the same whatever numbers have been written 
to represent the levels. In Table 1 the levels of the two-level factors are denoted 

‘by 0 and /, but any other pair of unequal numbers could have been used for 
either factor. This squared correlation coefficient, p’ say, is equal to the loss of 
efficiency, due to nonorthogonality, in estimating the response to each factor 
when both responses are simultaneously estimated by least squares (all other 
factors being ignored). That is, the error variance for each response is equal to 
what it would have been if the two factors had been orthogonal, divided by 
1 — p’. We can also interpret p’ as a “coefficient of influence”, as follows. If 
one of the factors, Y say, has a real effect, while the other, X, does not, p’ is 
the proportion of the real effect of Y (squared response) that reappears as an 
apparent effect of X, if X is considered separately, with no allowance made for 
the nonorthogonality of X and Y. 

A similar squared correlation coefficient can be calculated between any pair 
of columns of the design matrix, even if the factors concerned have more than 
two levels. But now the result will depend on the numerical scoring of the levels, 
and will relate to particular single components of the responses to the factors. 
If the numerical scoring shown in Table 1 is used, the correlations will relate 
to the linear components of the responses, on the assumption that each factor 
is quantitative and its levels are a linear function of the level-symbols. All the 
coefficients obtained in this way from the design matrix of Table 1 are shown in 
Table 3. 

The following propositiont can easily be proved: If the levels of two factors 
are represented in a design matrix by any numerical symbols (not all equal), 


* Two coefficients are needed to specify the response to factor A, and two more for each 
of B and C, three for D, four for E, and one each for F, G and H; sixteen coefficients in all. 
Among the twelve observations only eleven independent comparisons can be made. All sixteen 
coefficients cannot therefore be estimated independently. 

+ Dr. C. L. Mallows informed me of this result, for the case of factors at 2 levels. 


—_ pp eft} 2b _ -<s 


-_ 





QUICK ANALYSIS METHODS 203 


TABLE 3 
Squared correlation coefficients between pairs of columns of the design matrix in Table 1. 


B Cc D F G H 


0.250 0.016 0.133 : : 0.000 
0.141 0.008 : : 0.042 

0.008 : : 0.167 

0.000 

0.242 

0.111 

0.000 


the average value of the squared correlation coefficient between the two columns, 
under random permutation of the entries in either column, is equal to.1/(n — 1), 
n being the number of rows. The various levels need not appear equally frequently. 

Thus the expected value of all entries in Table 3 is 1/11, and the average 
value of the 28 entries is in fact almost exactly that, namely 0.0906. The in- 
dividual values range from 0 to 0.375. 

However, if a factor is at more than two levels, we shall most likely be interested 
in more than just one particular component of its effect. If the levels are quanti- 
tative, the linear component of response may be the most important, but curva- 
ture is interesting too. If the factor is qualitative, probably no single contrast 
is known in advance to be more interesting than all others. We might ask, 
concerning two factors, between what limits p’ would lie, if all possible components 
of the effects of the factors were considered. This is a question of canonical 
correlations. It is convenient to think in terms of coefficients of influence of one 
factor on another, as defined roughly above and more explicitly in the Appendix. 
The influence of Y on X depends on the nature of the response to Y, if Y is 
at more than two leveis; there is usually a range of possible values for the influence 
coefficient. If X and Y are at an unequal number of levels, the range of values 
for the influence of Y on X is not necessarily the same as that for X on Y. In 
Table 4 are shown the greatest and least values of the influence coefficients for 
the design matrix of Table 1. Also shown is an average value for each coefficient, 
appropriate as an expectation when all patterns of response to the influencing 
factor are judged equally likely a priori. Where only one entry is shown, it is 
the only possible value, and so at once the lower limit, upper limit and average. 

As an example to illustrate the meaning of Table 4, consider the influence 
on factor B by factor C. The coefficient may come anywhere from 0 to }. If the 
true response to C is proportional to 1 for level 0, —1 for level 1, and 0 for level 2, 
while the true response to B is zero, it is easy to see that no apparent response 
to B is induced by C, and the influence coefficient is 0. If, on the other hand, 
the true response to C is proportional to 1 for levels 0 and 1 and —2 for level 2, 
some of this response reappears as an apparent response to B, namely 1 for 
level 0 and —}4 for levels 1 and 2. The sum of squares for the apparent response 
to B is one-quarter that for the true response to C, and so the influence coefficient 





F. J, ANSCOMBE 


(681 0 ‘A8) 
969°0-000'0 
(6810 *A8) 
99° °0-000'0 
(9800 *A®) 
22 0-000°0 


(LIF ‘0 °A8) 
000° T-000°0 
(9F1°O *A®) 
89% °0-000'0 
(80Z°0 °A8) 
00S °0-000°0 
(910 “AB) 
ZOF 0-000 '0 


q 


(280°0 AB) 
IIT ‘0-000°0 
(280°0 ‘4®) 
TIT 0-000°0 
(S810 *48) 
989 '0-000°0 
(99¢°0 *A®) 
000° 1-000 °0 


(IIT’O *48) 
$8" 0-000°0 
(IIT ‘48) 
£8 '0-000°0 
(2910 °48) 
0&2 0-000'0 


d 


(€80°0 *48) 
29T°0-000°0 


000°0 


000°0 
(Z6Z 0 °A®) 
SSF 0-SZ1'0 
(Z9T 0 *A®) 
€8°0-000°0 


(SZ “0 *A8) 
092 '0-000°0 
(02 0 *A%®) 
L9%'0-€80'°0 


a 


(€80°0 ‘48) 
291° 0-000°0 
(0&2 '0 *4%) 
00$°0-000°0 
(€80°0 *48) 
291 °0-000°0 
(Z1F°0 “48) 
00S ‘0-€88 0 
(2910 *48) 
$8 °0-000'0 
(SZ1 0 °48) 
0S2 °0-000°0 


e 
(08% 0 *A8) 
29% 0-€£0°0 


qd 


000°0 
(€80°0 *48) 
L9T°0-000'0 

000°0 


(2620 ‘A®) 
ZOF 0-181 0 


020 
(092° 0 *4®) 
L9b°0-£80°0 


(0820 *48) 
L9¥°0-€£0'0 


q 


y 


10308} UO 


103083 Ag 


"J QD, ut ztsyou ubssep ay; wosf paypynazve ‘sayjoun uo s0j9nf auo fo , aquenzfus,, fo spuaroufeod fo ‘sanjpa abpsaan pun ‘sy saddn pup samo] 
} @1avy, 





QUICK ANALYSIS METHODS 205 


is 4. For other true responses to C, the influence coefficient will lie between 0 
and 4; the average is }. 

As another example, consider the influence on factor E by factor D. For three 
possible (mutually orthogonal) true responses to D, the induced apparent 
responses to E, and the ratios of sums of squares of H-responses to D-responses, 
are as follows. 


True response to D Apparent response to Z Ratio of sums of squares 
G7 2 3 0 1 & 3 4 (influence coefficient) 


1 0 0 -1 er i 0 0 
-1 0 2 -1 2 0-1 0 -1 3 
-1 3 -1 -1 -1 -1 -1 3 -1l 1 


The average of the three canonical influence coefficients on the right is $ or 
0.556, and this is quoted in Table 4 as the average. The influence on factor D 
by factor E can be studied similarly. It is possible to find four orthogonal possible 
true responses for E, such that the influence coefficients on D are 0, 0, 3, 1, and 
the average of these is 75 or 0.417, given in the table. 

The following proposition, analogous to the one already quoted, can be 
proved, as indicated in the Appendix: The influence coefficient of a factor Y 
on a factor X, averaged over permutations of entries in the columns of the design 
matrix, is equal to (k — 1)/(n — 1), where k is the number of levels of factor X. 
The k levels of X need not appear equally frequently; the number of levels of 
Y is immaterial. 

Thus the theoretical expected value for the entries in the first three rows of 
Table 4 is 3 or 0.182, and the average of the 21 “average” coefficients shown 
is 0.157. For the next row (factor D) the expected value is +; or 0.273, and the 
actual average is 0.254. For the next row (factor EZ) the expected value is 7, 
or 0.364, and the actual average is 0.413. For the last three rows the expected 
value is 7y or 0.091 and the actual average is 0.067. 

It is clear from Table 4 that the actual influence coefficients range much higher 
than the theoretical mean of (k — 1)/(n — 1). The following approximate result 
is obtained in the Appendix: If p’ measures the influence exerted by a factor Y 
(which has a certain real effect on the yields) on a factor X having no real effect, 
the chance distribution of (n — 1)p’ under random permutation of the columns 
of the design matrix is approximately x” with (k — 1) degrees of freedom, k 
being the number of levels of X, provided n > k. 

Accepting this x’ approximation, we can make calculations of the following 
kind. Suppose we agree that only rarely, say in not more than 5% of cases, 
should an influence coefficient exceed some sizable value, say }. Then (n — 1) 
should be not less than four times the upper 5% point of the x’ distribution 
with (k — 1) degrees of freedom. We find: 


if all factors are at 2 nn n should be not less than 16, 
if somefactors - - . - 25, 
‘ ‘ - 32, 

39. 





206 F. J. ANSCOMBE 


These recommendations can be briefly summarized: take n > 8k, where k is the 
greatest number of levels of any factor. Naturally, a more or less cautious person 
will wish to increase or reduce (respectively) the factor 8; but it is clear at any 
rate that if k is large, n should be much larger. 

Thus, in considering the confusion that nonorthogonality causes in the estima- 
tion of responses, we are led to much the same conclusion as before when con- 
sidering significance tests, namely, that caution is advisable in assigning many 
levels to the factors. The fact that a random balance design can just as easily 
accommodate a factor at a dozen levels as at two should not encourage the 
thought that one can just as well have a dozen levels. 


4. GENERAL REMARKS 


The above discussion has concentrated on a simple approach to analysis, 
likely to be adequate if very few of the factors tested have an appreciable effect. 
Graphical and other quick methods have an essential place in statistics, if only 
as a guard against blunders in a more elaborate analysis. But when the carrying 
out of an experiment is expensive, it would usually be foolish to stop at a quick 
analysis, if there is the possibility that the latter may have failed to extract all 
the useful information available. Dempster [2] has made an interesting study 
of the relation between the observations in a random balance experiment (where 
sampling is unconditional and without replacement) and the totality of observa- 
tions that would have been obtained in a complete replication (N observations), 
and has proposed certain analysis methods designed to bring out this relation. 
Beale and Mallows [1] have studied least-squares estimation of constants, when 
the number of the latter is of the same order of size as the number n of observa- 
tions available; they show that good estimation is made possible by a small 
amount of prior information to the effect that most of the constants are small in 
magnitude, and they have discussed the basis for such prior information. Their 
method does not relate specifically to random balance experiments, but might be 
expected to be valuable for them. Satterthwaite has explored several approaches 
to the analysis of screening and other complex factorial experiments; see [4].* 

Studies such as these are needed before we can begin to assess fairly the value 
of random balance designs in comparison with other types. It may well be that a 
systematic design can always be found which, correctly analysed, will be tech- 
nically more efficient thai a random balance design. But at least two things 
can be said to the credit of random balance:— 

1. Random balance designs are very easy to write down, and the whole 
conception is strikingly simple. They may therefore appeal to experimenters 
with little statistical knowledge, for whom the only practical alternatives 
are simple factorial designs with few factors, or else “‘one at a time” methods. 

2. Random balance has provided a challenge and stimulus to the purveyors 
of orthodox systematic designs, in two distinct directions—(a) greater flexibility 


* A draft of [4] came to hand only as the present paper was being completed. It is therefore 
not discussed here. 





QUICK ANALYSIS METHODS 207 


in accommodating factors at various numbers of levels, (b) the feasibility of 
screening many factors with few observations. 


5. ACKNOWLEDGMENT 


I am much indebted to John Tukey, Colin Mallows, and the referees for helpful 
suggestions and improvements. 


REFERENCES 

{1] Beatz, E. M. L. anp Mattows, C. L. On the analysis of screening experiments. Annals 
of Mathematical Statistics (to appear). 

(2] Dempster, A. P. Random allocation designs, I: On general classes of estimation methods. 
Annals of Mathematical Statististics (to appear). 

[3] SarrerTHwaiTs, F. E. Random balance experimental designs. 1957 Middle Atlantic 
Conference Transactions, American Society for Quality Control, pp. 61-2. 

[4] SarrerTHwarTE, F. E. Random balance experimentation. Technometrics, 1 (1959). 


[5] Tuxny, J. W. A quick compact two-sample test to Duckworth’s specifications. Tech- 
nometrics, 1 (1959), 31-48. 

[6] Wexcu, B. L. On tests for homogeneity. Biometrika, 30 (1938), 149-58. 

[7] Youne, L. C. On randomness in ordered sequences. Annals of Mathematical Statistics, 
12 (1941), 293-300. 


APPENDIX 


Suppose that factor X has k levels and factor Y has / levels. Let n;; denote the 
number of experimental units (rows of the design matrix) for which X is at level 
i and Y is at level j, where i ranges over 0,1,2,---,k4—1 and j over 
0,1, 2,---,2— 1. Let 


Ni. = TMi; , n.; = ZNi; , nN = Z,ZjNj; . 


For a fixed design matrix, let us regard the yields as realizations of independent 
random variables. Suppose that Y is the only factor having a real effect, and 
that at level j of Y the expected yield is uy; = ~@ + a; , where 2jn.;a; = 0. Then 
the apparent mean yield at level i of X has expected value 


ZN iO 
N;. 


a+ 


We define the influence coefficient of Y on X to be the ratio of the sum of squares 
of deviations from j of the expected mean yields of X (with weights n,.) to the 
corresponding sum of squares for the expected mean yields of Y, that is 


a Z(Z nia)" /ns. 
oie Zn.,07; (5) 


The symbol p” is used, because the coefficient can be regarded as a squared 
multiple correlation coefficient, in the following sense. In the Y-column of the 
design matrix, let the level-symbols (j) be replaced by (a;). In the X-column, 
let the level symbols (7) be replaced by arbitrary numbers (6;). Then p’ as defined 
above is the greatest value of the squared correlation coefficient between the 





208 F. J. ANSCOMBE 


X and Y columns, for variation of (8;). Taus p’ is the squared correlation between 
a particular response to Y and that component of the X-effect which has greatest, 
correlation with the response-to Y. For different possible responses to Y, i.e. 
different (a,), we have in general different values of p’. 

Stationary values of p’, for variation of the vector (a;), are easily seen to be 
roots \ of the 1 X 1 matrix Q whose (j, j’)th term is 


oe (atta) 
es N.; “ N;. 


The corresponding (a;) are the eigenvectors, thus: 
25Qij'0j" = Haw; . 


One root of Q is 1, with unit eigenvector. The eigenvectors for roots not equal 
to 1 all satisfy 2,n.;2; = 0, and so represent possible true responses to factor Y. 
Thus the stationary values of the influence coefficient of Y on X are all the roots 
of Q remaining after a 1 has been deleted. (If we now consider the influence 
coefficient of X on Y, the non-zero roots will be the same as for Y on X, but the 
number of zero roots will be different if k ¥ 1.) 

To find an expected value for the influence coefficient, averaging over all 
possible effects (a;) of Y, it is convenient to assign the following probability 
distribution to (u;): u; independently normally distributed with common mean 
and with variance proportional (and let us say equal) to 1/n.; . In the case when 
all the n.; are equal, this implies a spherically symmetric distribution for (a;); 
and for unequal n.; a simple distortion of that. It is now straightforward to 
show that the denominator of the right-hand side of equation (5) is distri- 
buted like x’ with 1 — 1 degrees of freedom, independently of p’ itself. Hence 
&(p°) is the ratio of expectations of numerator and denominator, and is found 
to be 


&(p") = {2.2, — bya = % 


Ni N.; 


{trace (Q) — 1}/(l — 1) {trace (Q) — 1}/@ — 1) 
= the average of the 1 — 1 stationary values of p’. 


All this relates to a fixed design matrix. Suppose now that the entries in the 
columns of the design matrix are permuted at random. Then each n,; hasa 
hypergeometric distribution, well known in the theory of contingency tables; 
and the expected value of the above is found to be 


k-—1 
n—1’ 


the ratio of the degrees of freedom for X to the total degrees of freedom. 

This result can be established directly as the mean value for p’ under permuta- 
tion of the columns of the design matrix, for a fixed effect of Y, without averaging 
over a probability distribution for the effect of Y. It is essentially Welch’s result 
quoted at (2) above, that &(U) = 0. For suppose we calculate the statistic U 


E(p’) = 





QUICK ANALYSIS METHODS 209 


corresponding to factor X, using not the actual yields but the expected yields 
(u;), Which depend on the level of Y. We find, from (1) and (5), that 


—l1,, 
U =1-2— 1 — 9), 


(n— 1p =@—hHU+ (Kk — 1). 


From (2) we can immediately obtain an expression for var(p’), which is nearly 
but not quite independent of the response to Y if Y is at more than two levels. 
If n > k, the x’ approximation to the distribution for U can be re-expressed: 
(n — 1)p” has approximately the x’ distribution with (k — 1) degrees of freedom. 








Vot. 1, No. 2 TECHNOMETRICS May, 1959 


NOTICES 


ADVANCED CouRSE IN STATISTICS 


The Chemical Division of ASCQ announces the advance course: Statistical 
Methods for Control, Inspection and Process Development. This course deals 
with basic fundamentals and the most recent advances and is for industrial 
statisticians and teachers of industrial statistics. The course will be held at 
London, Ontario, Canada, on August 10 through 20, 1959. 

The object of the course is to present the more recent advances in application 
and technique of statistical methods so as to make them more generally available 
to industry and to provide a deeper understanding of the basic philosophy and 


pattern of thinking requisite to the most successful and productive application 
of these methods in industry. 


Course CONTENT 


This course follows the highly successful ten day course on advance topics 
given at the Harvard Business School last year. It is planned to hold such ad- 
vanced courses periodically. In general, these advanced courses will have a 
two-fold objective: 

(i) to treat already known but basically important ideas particularly in the 
areas of statistical design and quality control in a fundamental and unified 
manner and 

(ii) to discuss the latest developments in the field of industrial statistics and 
experimentation. 


Among the important basic topics to be discussed in the present ten day 
course will be: 


(i) a study of factorial designs from a new and simplified standpoint with 
special reference to their use in screening situations along with an objective 
assessment of the value of random balance designs. 

(ii) the method of least squares with special emphasis on its geometric inter- 
pretation and application to response surface methods and in non-linear situ- 
ations (the fitting of differential equations, etc.,) 

(iii) the analysis of messy data which either from accident or necessity does not 
originate from a complete design 

(iv) practical considerations in the application of Evolutionary Operation in- 
cluding a new method of simplified calculations which can be readily applied 
by plant personnel. 


(v) the basic concept of sampling inspection with special emphasis on recent 
developments. 


211 





212 NOTICES 
Among the new topics to be discussed will be: 


(i) a new approach to quality control. New methods will be discussed by Pro- 
fessor George Barnard who originated them specifically to deal with production 
problems in England. These new methods were the topic of an important paper 
read before the research section of the Royal Statistical Society this year, but 
not yet generally available in the United States. 

(ii) a new method for fractionating three level factorial designs. These new 
designs for the first time provide useful arrangements in three levels requiring 
only a modest number of runs. 

(iii) a recently developed method for obtaining the most appropriate transfor- 
mation in the independent variables. With this new technique the value of re- 
gression and response surface methods is considerably enhanced by allowing 
the possibility of the construction of simple models in transformed variables, 
ie. Inz, , zt, --: ,U/x,. 

(iv) the effect in factorial designs of errors transmitted from the independent 
variables. It is frequently the case that most of the observed variation is due to 
the fact that the levels required of a factorial design are themselves subject to 
experimental error. The effects of errors in the independent variables is discussed. 

(v) design of experiments for non-linear models. Frequently a mathematical 
model can be developed from theoretical considerations. Such models are gen- 
erally non-linear in the parameters. The problem is discussed of choosing the 
most appropriate levels of the independent variables in such circumstances. 

The method of presentation will consistently be from the point of view of the 
problem rather than the technique. 


INSTRUCTING STAFF 


Instructing staff will consist of the following statistical consultants all of 
whom will be present on a full-time basis: 


George A. Barnard: professor of statistics at the Imperial College of Science and 
Technology of London University. 


William G. Cochran: professor of statistics at Harvard University. 

George E. P. Box: director of the Statistical Techniques Research Group at 
Princeton University. 

J. Stuart Hunter: a member of the Statistical Techniques Research Group at 
Princeton University. 


This is not a beginners course in elementary statistical methods and it will be 
assumed that the student is familiar with the theory and application of such 
standard techniques as regression and the basic ideas of sampling inspection 
and control. To bound the situation, and speaking generally, persons whose 
study has been limited to the most elementary statistics texts will not find the 
going easy. At the other extreme those who have worked through Davies, “The 
Design and Analysis of Industrial Experiments,” should not expect to find the 
going difficult. 





NOTICES 213 
For detailed information concerning course fees, schedule, housing, etc., write 
Dr. H. P. Andrews 


Swift and Co., Research Laboratories 
Union Stock Yards, Chicago 9, Illinois 


Tue Sratistics oF Lire TESTING 
August 17-28, 1959 
Course Content 


This short course deals with statistical methodology useful in designing, 
conducting, and analyzing life-test experiments. Emphasis is on the practical 
aspects of designing life tests and analyzing life-test data. Topics treated include: 


SBE ESET EAE EAE BR AE KE BER SPIER OG A AS Toe 


1. Stochastic models for failure and resulting distributions of time to failure 
2. Design and analysis of life tests assuming the exponential distribution, 
including: 
(a) estimation of mean life and hazard 
(b) sequential testing 
(c) truncated life tests (numbers and/or time) 
(d) analysis of field failure data 
(e) tests for exponentiality 
. Analysis of life-test data under assumptions that the distribution has one 
of the following forms: 
(a) Weibull 
(b) gamma (Pearson type III) 
(ec) log normal 
(d) normal 
(e) truncated versions of the above 
4, Inference when the data are incomplete 
5. Methods of analysis independent of the underlying distribution 


Faculty 
William R. Allen 
Research Scientist, College of Engineering, New York University 
Benjamin Epstein 
Professor of Mathematics, Wayne State University 
Milton Sobel 
Adjunct Associate Professor of Mathematics, New York University 


Fees 
Tuition for the course is $250.00, which includes the cost of materials. 


For further information write 
David Thomas, Assistant to the Director 
Office of Special Services to Business and Industry 
Gallatin House, New York University 
6 Washington Square North, New York 3, N. Y. 





214 NOTICES 
CHEMICAL DIviIsION CONFERENCE 


The following is a list of the speakers and titles of the papers scheduled for 
the Third Annual Chemical Division Conference of the American Society for 
Quality Control to be held in Houston, Texas on 24, 25 September 1959. For 
information concerning attendance, reservations, etc., write 


Mr. Jerrold H. Moyer 

Technical Service Director 
Champion Paper Fiber Company 
Pasadena, Texas. 


Mr. Victor B. SHELBURNE, The Carborundum Company 

““Experimenters, Statisticians and Models” 

Mr. Lyte D. Paunxz, E. I. duPont de Nemours and Company 

“A Field of Application of Covariance Analysis” 

Dr. Leroy Fouks, Texas Instruments Incorporated 

“Optimal Designs for Response Surface Exploration” 

Dr. Exuis F. PARMENTER, The Champion Paper and Fibre Company 

“Estimating the Precision of Test Averages from a Study of Variance Compo- 
nents” 

Mr. A. W. Dickinson, Monsanto Chemical Company 

“Non-Linear Regression” 

Mr. M. A. Kine, Pitisburgh Plate Glass Company 

“Experimental Design—Subdivision of Treatments” 

(Co-authored by M. A. King and F. Takenaka) 

Dr. G. E. P. Box, Princeton University 

“Transformation of the Independent Variables” 

Mr. Epwin C. Harrineton, JRr., Monsanto Chemical Company 

“Industrial Applications of Nonparametric Statistics” 

Mr. Joun D. Hromt, United States Steel Company 

“Fractional Factorial Experiments and Their Augmentation” 

Mr. Frank W. Krotu, Esso Standard Oil Company 

“Effective Quality Control Program for the Industrial Control Laboratory” 

Dr. Georce E. Krupatt, Arthur D. Little, Inc. 

“The Grasshopper Program’”’ 

Mr. C. F. Lewis, Cook Heat Treating Company 

“Some Transformations for Graphical Techniques” 

Mr. E. 8. AtprepDGE, Union Carbide Chemical Company 

“Use of Control Charts on Chemical Processes” 

Mr. Harotp Davipson AND Mr. W. C. Quinn, Merck and Company, Inc. 

“Determination of Sources of Yield Variation in a Multistep Chemical Process” 

Dr. Exuis R. Ort, Rutgers University 

“Analysis of Means” 

Dr. J. 8. Hunter, Princeton University 

“Evolutionary Operation” 





NOTICES 
INTENSIVE SHORT CourRsE 


Sponsored by Institute of Statistics, North Carolina State College 
To Be held at Brevard College, Brevard, North Carolina—August 10 to 15, 1959 


Introductory Section 


This section of the course will cover the basic concepts of Statistics at about 
the level of a senior year course for students in engineering or in the physical 
sciences. The emphasis will be on the analysis of experimental data and on the 
role of statistics in the scientific method. Topics to be considered include: 


. Concept of random variation and its mathematical treatment 
. Normal, binomial and Poisson distributions 
. Sampling and statistical inference 
. Confidence limits and significance testing 
. Correlation and curve fitting by least squares 
6. Introduction to analysis of variance and design of experiments 


Advance Section 


This section is intended for students who already have a working knowledge 
of the topics listed in the Introductory Section. Topics to be covered in the 
Advance Section include: 


. Analysis of variance for factorial and split plot experiments 
. Fractional factorials and confounding 

. Incomplete block designs 

. Variance components 

. Multiple regression analysis 

. Response surface methodology 


Dr. R. L. Anderson Dr. R. J. Monroe 
Dr. H. L. Lucas Dr. R. J. Hader 
Mr. Victor Chew 


General Information 


Enrollment will be limited to fifty students in the Elementary and fifty 
students in the Advanced Section. A registration fee of $125.00 will be payable 
either in advance or upon arrival. The course will be held on the Campus of 
Brevard College located approximately 25 miles south of Asheville, N. C. For 
further information write: 


Division of College Extension 
Box 5125 

State College Station 
Raleigh, N. C. 





BIOMETRICS 
JOURNAL OF THE BIOMETRIC SOCIETY 
Vol. 15 No. 2 CONTENTS June 1959 


A Simple Method for Constructing Orthogonal Polynomials 
When the Independent Variable is Unequally Spaced D. S&S. Robson 


The Estimation of Environmental and Genetic Trends 
from Records Subject to Culling C. R. Henderson, 
Oscar Kempthorne, 
S. R. Searle, and 
C. M. von Krosigk 


Experimental Design in the Evaluation of Genet’: Parameters Alan Robertson 


A Distribution-Free Asymptotic Method of Estimating, 

Testing, and Setting Confidence Limits for Herit- 
ability Lorraine Schwartz 
and Stanley Wearden 


The Regression Analysis of Causal Paths Malcolm E. Turner 
and Charles D. Stevens 


A Class of Two Replicate Incomplete Block Designs J. Roy 
The Centric Systematic Area-Sample Treated as a Random Sample A. Milne 
Sensory Item Sorting N. T. Gridgeman 


QUERIES AND NOTES 


A Confidence Interval on the Abscissa of the Point of 
Intersection of Two Fitted Linear Regressions Marvin A. Kastenbaum 


On the Estimation of the Mean of a Poisson Distribution from 
a Sample with the Zero Class Missing J. O. Irwin 
Differential Regression John T. Webster 


Biometrics is published quarterly. Its objects are to describe and exemplify the use of 
mathematical and statistical methods in biological and related sciences in a form assimilable 
by experimenters. The annual non-member subscription rate is $7.00. Inquiries, orders for 
back issues, and non-member subscriptions should be addressed to: 


BIOMETRICS 

Department of Statistics 
Virginia Polytechnic Institute 
Blacksburg, Virginia 


hy CEA EPEPOUTSZ. 


on 
ty 
be 
ta 
1; 
ar 
el 
ul 
al 
m 
01 
C 
la 
















PREPARATION OF MANUSCRIPTS 


Manuscripts should be submitted to the office of the editor: J. 8. Hunter, 167 
Nassau St., Princeton, New Jersey. Each manuscript should be typewritten, 
double spaced, with wide margins at sides, top, and bottom. The original should 
be submitted with two additional copies, on paper that will take corrections. 
Dittoed or mineographed papers are acceptable only if completely legible. 
Footnotes should be avoided and replaced by remarks in the text, or placed in 
an appendix. Preferably, references in the manuscript should appear as (Jones, 
A. B., 1958), and again later in alphabetical order in a list of references. Al- 
ternatively references may be numbered, e.g. [1], as they appear in the manu- 
script and be listed in this sequence in the list of references. In the reference 
list, each reference should contain, in the order indicated, the name and initials 
of the author followed by those of the co-authors, date of publication, title of 
reference, source, volume number and page. References to books should include 
publisher’s name and location. 

Figures, charts, and diagrams shouid be evdieiisile drawn on plain white 
paper or tracing cloth in black India ink twice the size they are to be printed. 
A full page diagram, in print, measures 7.25 X 4.75 inches. 

As far as possible, formulas should be typewritten and symbols not available 
on a typewriter carefully inserted in ink. Authors are asked to keep in mind the 
typographical difficulties of complicated mathematical formulae. The difference 
between capital and lower-case letters should be clearly shown; care should be 
taken to avoid confusion between such pairs as zero and the letter O, the numeral 
1 and the letter 1, numeral 1 used as superscript and prime (’), alpha and a, kappa 
and k, mu and w, nu and », eta and n, ete. Subscripts or superscripts should be 
clearly below or above the line. Bars above groups of letters (e.g., log x) and 
underlined letters (e.g., 7) are difficult to print and should be avoided. Symbols 
are automatically italicized by the printer and should not be underlined on 
manuscripts. Boldface letters may be indicated by underlining with a wavy line 
on the manuscript; boldface subscripts and superscripts are not available. 
Complicated exponentials should be represented with the symbol exp particu- 
larly when appearing in the text, that is, 


exp [(a* + 0°)”] should be used in place of ¢‘****”*”*, 


In writing square roots the fractional exponent is preferable to the radical sign. 
Fractions in the body of the text (and when possible in displayed expressions) 
and fractions occurring in the numerators or denominators of fractions are 
preferably written with the solidus; thus 


a+b 
(6 + 6)/(¢ + d) rather than “~~; - 


Authors will ordinarily receive only galley proofs. Fifty offprints without 
covers will be furnished free. Costs for additional reprints and covers can be 
furnished on request. 


























































































































CONTENTS 


TECHNOMETRICS, VOL. 1, Mo. 2, MAY 1959 


Measurements Made by Matching with Known Standards 
W.J. Youden, W.S. Connor and N. C. Severo 


Random Balance Experimentation .........F. E. Satterthwaite 


The Application of Random Balance Designs. .....T. A. Budne 


Discussion of the Papers of Messrs. Satterthwaite and Budne 
W.J. Youden, Oscar Kempthorne, J. W. Tukey, 
G. E. P. Box.and J. S. Hunter 


Quick Analysis Methods for Random Balance 
Screening Experiments vesceeeeesel. J. Anscombe 


ON Ls iid se cou Ge eb Re cae eee te al eee es 





